(navigation image)
Home American Libraries | Canadian Libraries | Universal Library | Community Texts | Project Gutenberg | Children's Library | Biodiversity Heritage Library | Additional Collections
Search: Advanced Search
Anonymous User (login or join us)
Upload
See other formats

Full text of "Mathematics for Engineers & Scientists ( 1st ed.)"

Mathematics for 
Engineers and 
Scientists 



Alan Jeffrey 




Mathematics for 




Applications of Mathematics Series 

Editor: Alan Jeffrey 

Professor of Engineering Mathematics 

University of Newcastle-upon-Tyne 

William F. Ames 

Numerical Methods for Partial Differential Equations 

T. J. M. Boyd and J. J. Sanderson 

Plasma Dynamics 

C. D. Green 

Integral Equation Methods 

I. H. Hall 

Deformation of Solids 

Jeremy Hirschhorn 

Dynamics of Machinery 

Alan Jeffrey 

Mathematics for Engineers and Scientists 

Brian Porter 

Synthesis of Dynamical Systems 



Engineers 
and Scientists 

Alan Jeffrey 

University of Newcastle-upon-Tyne 




NELSON 



£ 



Thomas Nelson and Sons Ltd 
36 Park Street London W1Y 4DE 

Nelson (Africa) Ltd 

PO Box 18123 Nairobi Kenya 

Thomas Nelson (Australia) Ltd 

171-175 Bank Street South Melbourne Victoria 3205 

Thomas Nelson and Sons (Canada) Ltd 
81 Curlew Drive Don Mills Ontario 

Thomas Nelson (Nigeria) Ltd 
PO Box 336 Apapa Lagos 

First published in Great Britain by Thomas Nelson and Sons Ltd., 1969 
Reprinted with amendments 1971 
Reprinted 1973 

Copyright © Alan Jeffrey 1969, 1971 

\11 Rights Reserved. No part of this publication may be reproduced, 
ored in a retrieval system, or/tfansmitted, in any form or by 
V means, electronic, mechanical, photocopying, recording or otherwise, 
out the prior permission of the publishers. 

. 761605 9 (Boards) 
17 771604 5 (Paper) 

Reproduced and printed by photolithography and bound in 
Great Britain at The Pitman Press, Bath 




5\0-^*° 



Preface 



This book has evolved from an introductory course in mathematics given to 
engineering students at the University of Newcastle-upon-Tyne during the 
last few years. It represents the author's attempt to offer the engineering 
student, and the science student who is not majoring in a mathematical 
aspect of his subject, a broad and modern account of those parts of mathe- 
matics that are finding increasingly important application in the everyday 
development of his subject. 

Although this book does not seek to teach any of the many physical 
disciplines to which its results and methods may be applied, it nevertheless 
makes free use of them for purposes of illustration whenever this seems to be 
helpful. Every effort has been made to integrate the various chapters into a 
description of mathematics as a single subject, and not as a collection of 
seemingly unrelated topics. Thus, for example, matrices are not only intro- 
duced in an algebraic context, but they are also related in other chapters to 
change of variables in partial differentiation and to the study of simultaneous 
differential equations. 

Modern notation and terminology have been used freely but, it is hoped, 
never to the point of becoming pedantic when a simple word or phrase seems 
more natural. Of necessity, much of the material in this book is standard, 
though the emphasis and manner of introduction and presentation frequently 
differs from that found elsewhere. This is deliberate, and is a reflection of 
the changing importance of mathematical topics in engineering and science 
to-day. 

In many introductory mathematics texts for engineering and science 
students no serious attempt is made to offer reasonable proofs of main 
results and, instead, attention is largely confined to their manipulation. 
Important though this aspect undoubtedly is, it is the author's belief that 
knowledge of the proof of a result is often as essential as its subsequent 
application, and that the modern student needs and merits both. With this 
thought in mind proofs of results have always been included, and, though 
they have been kept as simple as possible, no attempt has been made to 
conceal difficulty where it exists. Only very occasionally, when the proof of 
a result is lengthy, and its details are largely irrelevant to the subsequent 
development of the argument, has the treatment been shortened to a summary 
of the logical steps involved. Even then the interested reader can often find 
more relevant information amongst the specially selected problems at the 
end of each chapter. 

As implied by the previous remark, the many problems not only comprise 
those offering manipulative exercise, but also those shedding further light 



vi / PREFACE 

on topics only touched upon in the main text. No serious student can progress 
in his knowledge of this subject without a proper investment of time and effort 
spent working at a selection of these problems. The main text is provided with 
numerous illustrative examples designed to be helpful both when working 
through the text and when attempting the classified problems. It is hoped 
that their inclusion also makes the book suitable for private study. 

The wide range of material covered in this book represents rather more 
than would normally be contained in an introductory course of lectures. 
Whilst allowing for changing approaches in teaching, this fact also permits 
some flexibility in use of the material and at the same time offers further 
relevant reading to the ambitious student. In addition to the author's own 
experience of the application of mathematics in engineering and science, the 
choice and style of presentation of material has been influenced by two 
recently published documents: the Council of Engineering Institutions 
syllabuses in mathematics in Britain and the CUPM recommendations made 
by the Mathematical Association of America. It is the author's hope that 
this book complies fully with the former document and with the spirit of the 
latter insofar as its recommendations are applicable to engineering and 
science students. 

The material has all been class-tested and, as a result, has undergone 
considerable modification from its first appearance as lecture notes to the 
form of presentation adopted here. It is a pleasure to acknowledge the help 
of the publishers who have given me continued encouragement and every 
possible form of assistance throughout the entire period of preparation of 
the book. 

,A. J. 



As a direct result of requests by users of the first printing of this book it 
was decided that a short chapter on Fourier Series should be added. The 
present revised imprint contains this new material and also incorporates a 
number of small corrections drawn to the author's attention by various kind 
readers. 

A. J. 



Contents 



1 Introduction to Sets and Numbers 1 

V} Sets and algebra / 1-2 Set theory and probability 9 1-3 
itegers, rationals and arithmetic laws 21 14 Absolute value of a real 
number 28 15 Representation of numbers 29 w l-6 Mathemati- 
cal induction 3/ ^Problems 35 



2 Variables, Functions, and Mappings 41 

2-1 Variables and functions 41 2;2 ' Inverse functions 48 %J> 

'Some special functions 54 2-4 Digression on mappings 58 2-5 

Curves and parameters 61 2& Functions of several real variables 64 
Problems 67 



3 Sequences, Limits, and Continuity 73 

31 Sequences 73 3-2 Limits of sequences 79 3-3 The number 

e 86 3-4 Limits of functions — continuity 89 3-5 Functions of 

several variables — limits, continuity 98 3-6 A useful connecting 
theorem 102 Problems 105 



,4 Complex Numbers and Vectors 115 

41 Introductory ideas 115 4-2 Basic algebraic rules for complex 
numbers 118 4-3 Complex numbers as vectors 123 4-4 Modu- 
lus-argument form of complex numbers 128 4-5 Roots of complex 
numbers 132 4-6 Introduction to space vectors 134 4-7 Scalar 
and vector products 147 4-8 GeoVnetrical applications 157 ■ 4-9 
Applications to mechanics 163 Problems 167 



5 Differentiation of Functions of One or More Real 
Variables 178 

5-1 The derivative 178 5-2 Rules of differentiation 189 5-3 
Some important consequences of differentiability 797 54 Higher 
derivatives — applications 216 5-5 Partial differentiation 222 5-6 
Total differential 228 5-7 Envelopes 234 5-8 The chain rule 
and its consequences 239 5-9 Change of variable 243 5-10 Im- 
plicit functions 248 511 Higher order partial derivatives 253 Prob- 
lems 257 



viii / CONTENTS 

6 Exponential, Hyperbolic, 

and Logarithmic Functions 270 

6-1 The exponential function 270 6 2 Differentiation of functions 
involving the exponential function 277 6-3 The logarithmic function 
281 6-4 Hyperbolic functions 287 6-5 Exponential function with 
a complex argument 293 Problems 296 



7 Fundamentals of Integration 302 

7-1 Definite integrals and areas 302 7-2 Integration of arbitrary 
continuous functions 311 7-3 Integral inequalities 319 7-4 The 
definite integral as a function of its upper limit-indefinite integral 320 
7-5 Differentiation of an integral containing a parameter 324 7-6 
Other geometrical applications of definite integrals 326 1-1 Numerical 
integration 332 Problems 337 



8 Systematic Integration 345 

8-1 Integration of elementary functions 345 8-2 Integration by 
substitution 348 8-3 Integration by parts 355 8-4 Reduction for- 
mulae 358 8-5 Integration of rational functions-partial fractions 362 
8-6 Other special techniques of integration 368 Problems 372 



J> 



Linear Transformations and Matrices 378 



91 Introductory ideas 378 9-2 Matrix algebra 386 9-3 Deter- 
minants 396 9-4 Linear dependence and linear independence 404 
9-5 Inverse and adjoint matrix 406 9-6 Matrix functions of a 
single variable 410 9-7 Solution of systems of linear equations 413 
9-8 Eigenvalues and eigenvectors 421 9-9 Linear transformations 
424 9-10 Applications of matrices and linear transformations 426 
Problems 432 



1 Functions of a Complex Variable 444 

10-1 Sequences of complex numbers and limits 444 10-2 Curves and 
regions 448 10-3 Function of a complex variable, limits and con- 
tinuity 452 10-4 Derivatives — Cauchy-Riemann equations 458 10-5 
Conformal mapping 471 10 6 Applications of conformal mapping 
482 Problems 485 



CONTENTS / ix 

11 Scalars, Vectors, and Fields 492 

11-1 Curves in space 492 11-2 Antiderivatives and integrals of 
vector functions 504 11-3 Some applications 509 11-4 Fields, 
gradient, and directional derivative 575 11-5 An application to fluid 
mechanics 520 Problems 522 

1 2 Series, Taylor's Theorem and its Uses 531 

12-1 Series 531 12-2 Power series 549 12-3 Taylor's theorem 
557 12-4 Application of Taylor's theorem 571 12-5 Applications 
of the generalized mean value theorem 577 Problems 586 

13 Differential Equations and Geometry 596 

13-1 Introductory ideas 596 13-2 Possible physical origin of some 
equations 598 13-3 Arbitrary constants and initial conditions 601 
13-4 Properties of solutions — isoclines 604 13-5 Orthogonal trajec- 
tories 617 13-6 Modified Euler method 618 13-7 A simple pre- 
dictor-corrector method 619 Problems* 623 

14 First Order Differential Equations 626 

14-1 Equations with separable variables 626 14-2 Homogeneous 
equations 628 14-3 Exact equations 630 14-4 The linear equa- 
tion of first order 634 14-5 Equations with implicit dependence on x 
637 14-6 Clairaut's and Lagrange's equations 638 14-7 Picard's 
iterative method 641 14-8 Direct deductions and comparison theorems 
645 Problems 650 

15 Higher Order Differential Equations 656 

15-1 Linear equations with constant coefficients — homogeneous case 656 
15-2 Linear equations with constant coefficients — inhomogeneous case 661 
15-3 Variation of parameters 675 15-4 Simultaneous linear differen- 
tial equations 677 15 5 Series solution of differential equations 678 
15-6 Runge-Kutta method 680 15-7 Oscillatory solutions 683 

15-8 Coupled oscillations and normal modes 686 15 9 The Laplace 
transform 691 Problems 696 



x / CONTENTS 

16 Fourier Series 700 

16-1 Introductory ideas 700 16 2 Convergence of Fourier series 770 
16-3 Different forms of Fourier series 718 16-4 Differentiation and 
Integration 726 Problems 731 

Answers to selected problems 734 

Index 756 



Introduction to sets and 
numbers 



1 -1 Sets and algebra 

In applications of mathematics to engineering and science, we often use the 
properties of real numbers. Many of these properties are intuitively obvious, 
but others are more subtle and depend for their proper use on a simple 
understanding of the mathematical basis of the so-called real number system. 
This chapter describes the elements of the real number system in a straight- 
forward manner for subsequent use throughout the book. 

The reader will certainly know how to work with finite combinations of 
numbers, but what is less certain is whether he understands how to interpret 
and use limiting processes. For example, what is the meaning and what, if 
any, is the value to be associated with the limit 



lim 

m->-cc 



h$ 



which is to be interpreted as the value approached by the expression in square 
brackets as n increases without bound ? 

It was questions such as these and, indeed, far simpler ones that first led 
to the study of real numbers. Many properties of numbers, nowadays accepted 
by all as self-evident, were once regarded as questionable. This is still clearly 
apparent from much of the notation that is in current use. 

Thus, for example, the fact that \/2 cannot be expressed as the ratio of 
two integers led to its being termed an irrational number. Even more extreme 
is the term imaginary number that is given to \/—\. Although, as we shall 
see later, this number does not belong to the real number system and so 
merits special consideration, it is however no less real than the integer 2. 

Experience suggests that in any systematic development of the properties 
of the real number system, the operations of addition and multiplication must 
play a fundamental role. These conjectures are of course true, but underlying 
the idea of real numbers and their algebraic manipulation are the even more 
fundamental concepts of sets and their associated algebra. Because these 
notions are sometimes unfamiliar, we shall start by considering some simple 
but important ideas concerning sets. 

We must first define the term set for which the alternative terms aggregate, 
class, and collection are also often used. Our approach will be direct and 
pragmatic and we shall agree that a set comprises a collection of objects or 
elements, each of which is chosen for membership of the set because it 



2 / INTRODUCTION TO SETS AND NUMBERS CH 1 

possesses some required property. Membership of the set is determined en- 
tirely by this property; an object only belongs to the set if it possesses the 
required property, otherwise it does not belong to the set. The properties of 
membership and non-membership of a set are mutually exclusive. 

An important numerical-set which we shall often have occasion to use is 
the set N of natural numbers 1, 2, 3, . . ., used in counting. In future the 
symbol N will always be used to signify this natural set of positive integers. 
Notice that there can be no greatest member m of this set, since however 
large m may be, m + 1 is larger and yet is also a member of the set N. 
Accordingly, when we use a number m that is allowed to increase without 
restriction, it will be convenient to imply this by saying that 'w tends to 
infinity', and to write the statement in the form m—>-co. Notice that infinity 
is not a number in the usual sense, but just the outcome of the mathematical 
process of allowing m to increase without bound. It is always necessary to 
relate the symbol oo to some mathematical expression, since by itself it has 
little or no meaning. 

N is only one type of set however, and from the wording of our definition 
it is apparent that the elements of a set need not be numerical. Thus in statistics 
one is concerned with sets of events which may or may not be numerical, 
whereas in the analysis of logical operations one is concerned with sets of 
decisions. The notation and simple algebra we now develop are applicable to 
all sets and, hence, to any situations such as those just enumerated which are 
capable of description in terms of sets. 

To simplify the manipulation of these ideas we must introduce a notation 
for elements of a set, for sets themselves, and for the membership of an 
element to a set. It is customary to denote general elements of sets by lower 
case letters a, b, . . ., x, . . ., and sets themselves by capital letters A, B, 
. . ., S, . . .. If a is a member of set A we shall write 

a e A. 

This is usually read 'a is an element of A\ Conversely, if a is not an element of 
A we shall write 

a$A. 

In this notation we have 3 e N, but rr $ N, where 77 = 3-1415. . ., and N is 
the set of natural numbers. 

If a set only contains a small number of elements it is often simplest to 
define it by enumerating the elements. Hence, for a set 5" comprising the four 
integer elements 3, 4, 5, and 6 we would write 5 = {3, 4, 5, 6}. This set is a 
finite set in the sense that it comprises a finite number of elements. Con- 
versely, the set N of natural numbers is an infinite set since it contains an 
infinite number of elements. 

Often it is useful to have a notation which indicates the membership 
criterion that is to be used for the set. Thus, if we were interested in the set B 



SEC 1-1 SETS AND ALGEBRA / 3 

of positive integers n whose squares lie between the positive numbers m and 
2m, we would write 

B = {n\n e N, in < n 2 < 2m}. 

Here we have used the convention that the symbol « to the left of the vertical 
rule signifies a general element of the set in question, whilst the expressions to 
the right of the rule express the membership criteria for the set. There, of 
course, the symbol < when used in conjunction with numbers a and b in 
the form a < b is to be read 'a less than b\ 

An important set that is frequently used is the set of ordered pairs. An 
element of this set will be written (m, n), where m and n are not necessarily 
numbers and the element (m, n) is different from the element (n, m) unless 
m and n are identical. An important use of this set is in the construction of 
tables, when the ordered pair becomes an ordered number pair, the first 
member of which is usually the argument and the second member the func- 
tional value. Hence the ordered number pair {\tt, 0-5) could refer to the 
sine of the angle \tt radians. In this example the relationship between the 
first and second numbers of the ordered pair is determinate since sin ^77 = 0-5, 
but this is not always the case with ordered pairs. Thus if the ordered pair of 
integers (m, n) were used to describe the throw of a die in a series of N 
trials, as the statistician would call them, then m could represent the number 
of the throw or the trial number, and n the score resulting from that throw. 
Here m would range from unity to N, the number of trials in the statistical 
experiment, and n would be any integer between 1 and 6. There would then 
be no rule by which n could be predicted for any given m. 

Ordered number pairs are also encountered when constructing graphs of 
functions where the convention is usually that (a, b) signifies the point with 
x-coordinate a and j-coordinate b. Thus the graph of the function y == f(x) 
for which x is between a and b could be written in set notation 

S = {(x,f(x))\a <x<b}. 

The notation of an ordered pair as an element of a set readily extends to 
an ordered triple (m, n, r), which again need not necessarily involve numerical 
quantities, nor need it be determinate. Again, two ordered triples will only be 
identical if their corresponding entries are identical. Ordered number triples 
of a determinate kind occur when considering the graph of a function of two 
independent variables as, for example, the equilibrium temperature at a 
given point of a cross-section of a very long metal bar. 

Statistical events provide the most common source of ordered triples of 
the indeterminate variety. As a simple illustration we may consider the 
statistical experiment comprising N trials, each of which involves tossing a 
coin twice and recording the results of each throw as a 'head' (H) or a 'tail' 
(T). Then the first quantity in the ordered triple could record the trial number 
with the second and third quantities recording an H or a T according as the 



4 / INTRODUCTION TO SETS AND NUMBERS CH 1 

first and second throws gave rise to a 'head' or a 'tail'. A typical ordered 
triple would then be (3, T, H) in which the second and third entries in the 
ordered triple cannot be predicted from a knowledge of the first entry. 

It is often necessary to study relationships between sets and for this pur- 
pose an algebra of sets must be constructed. The simplest situation that can 
occur is that from a set A, a new set B is formed, such that all elements 
of B are also elements of A. Such a set B will be called a subset of A. This 
result will be written 

B £ A, 

which is to be read 'B is a subset of A\ 

If x is an element of A, so that we may write x e A, then either x e B, 
or x $ B. When there are some elements x e A which are not to be found in 
B, so that x' <£ B, then B is called a proper subset of A, the result being written 

B c A. 

The definition of a subset B of A does not preclude the possibility that 
for every element x e A it is also true that x e B. When this occurs sets A 
and B have the same elements and are said to be equal, the result being 
written 

A = B. 

It is clear from the definition of equality that when A = B both the 
statements A £ B and B £ A must be true. These last two statements are 
often useful as an alternative definition of equality between sets. 

With the above definitions it is clear that if A = N and B = {I, 2, 3, 4, 5}, 
then B c A; whereas if ^ = {4,7,3,5,9} and B = {7, 4, 5, 9, 3}, then 
A £ B and B £ A so that ^ = B. 

A more general situation arises when two sets A and B are involved, each 
of which possesses elements which are not common to the other so that 
neither statement A <= B, nor B <= A is true. The set of elements C that is 
common to these two sets A and B will be called the intersection of the sets 
A and B and is written 

Sometimes this is read 'A cap B' with the understanding just defined. 

In the event that there are no elements common to the sets A and B we 
shall write 

AnB = <f>, 

with the understanding that <j> is the null set, which we define to be the set 
containing no elements. Under these circumstances the sets A and B are said 
to be disjoint. 

By way of example, if A\ = {a, b, 1, 3, 5, 7} and Bi = {a, c, d, e, 3, 7, 9}, 



SEC 1-1 



SETS AND ALGEBRA / 5 




A^B 



a n bi 



A Ufii 



(a.) 



<b) 



(c) 



Fig. 1-1 Symbolic representation of set operations: (a) proper subset; (b)nnler- 
section; (c) union. 

then Ax n B x = {a, 3, 7}; whereas if A 2 = {1, 3, 7} and B 2 = {0, 4, 9, 11}, 
^2 n fi 2 = <£. 

Another important set related to sets A and B is the set C containing all 
the elements belonging to A, to B or to both A and B. This is called the union 
of sets yf and B and is written 

C = AUB; 

which reads 'A cup B\ With the sets defined above we obviously have 
AiV Bi = {a,b,c,d,e,l,3,5,7,9} and A 2 u 5 2 = {0, 1, 3, 4, 7, 9, 11}. 
Clearly, for any set ^4 we have j> <^. A, Av) j> = A, and A <~^<f> = <f>. 

These seemingly abstract ideas can be illustrated symbolically by means 
of a very convenient device. This is the so called Venn diagram, which uses a 
pictorial representation for the sets in question. Sets are represented by the 
interior of closed curves, usually of arbitrary shape, and their relationship is 
then illustrated by the relationships that exist between these curves. Thus, 
when as in Fig. IT (a) curve A representing set A lies within curve B repre- 
senting set B, we have the situation that A is a proper subset of B, so that 
A <= B. Figs IT (b), (c) illustrate, respectively, the intersection A n B and 
the union A u B of sets A and B, which are shown as shaded areas on those 
figures. 




■T5 



*** 



a n b\ 



Fig. 1-2 Sets in plane: (a) intersection; (b) union. 



A U B i 
(b) 



6 / INTRODUCTION TO SETS AND NUMBERS 



CH 1 



In general this representation is only symbolic, but in the event that 
elements of the sets A and B may be unambiguously represented by points 
in the plane, the Venn diagrams become true representations. 

Let set A comprise all the points within and on a circle of unit radius, 
usually called a unit circle, and centred on the origin, and let B comprise all 
the points within and on the circle of radius 2 centred on the point x = 2-5 
on the x-axis. Then the relationships A n B and A u B are truly represented 
by the shaded areas in Figs 1-2 (a), (b). 

Similarly, if we consider the sets A and B defined by the interiors and 




I 2 




A fl B = {1} A f) B=(/> 

(a) (b) 

Mg. 1-3 Intersection of sets in the plane: (a) single point contained in intersection ; 
(b) disjoint sets. 

boundaries of the two unit circles illustrated in Figs 1-3 (a), (b), we see that in 
(a), A c\B = {1}, so that only the single point x = 1 on the x-axis is common 
to A and B, whereas in (b), A n B = <f>. 

A final idea we now introduce in connection with sets A and B is the 
complement of B relative to A, which we shall write as A\B. This is a generali- 
zation of the notion of subtraction and comprises the set of elements of A 
that do not belong to B. The expression A\B is usually read 'A minus 5' 
and if, for example, A = {a, 1, 3, 7} and B = {a, 7, 9, 11} then A\B = {1, 3}. 
Appealing again to a Venn diagram, we illustrate this relationship by the 
shaded region in Fig. 1-4. 




A\B 



Fig. 1-4 Symbolic representation of complement of B relative to A. 



SEC 1-1 SETS AND ALGEBRA / 7 

The following useful results are almost self-evident and are true for 
arbitrary sets A, B, and C. They may be proved either from the basic defini- 
tions, or by appeal to Venn diagrams. 

Basic set operations 

AuA = AnA = A, (1-1) 

AnB = BnA, (1-2) 

Akj B = Bu A, (1-3) 

(AUB)UC = AU(BU C), (1-4) 

(AnB)nC = An(BnC), (1-5) 

Au(BnQ = (AvB)n(AuC), (1-6) 

An(BvC) = (AnB)u(AnC). (1-7) 
From these there follows an important theorem due to De Morgan: 

theorem 1-1 For any three arbitrary sets A, B, and C it is true that 
A\(B UC) = (A\B) n (A\C) 

and 

A\(B nQ = (A\B) u (A\C). 

Proof An analytical proof of the first stated result involves the following 
two steps: (a) the proof that if x is an arbitrary element such that 
x e A\(B u Q, then x e (A\B) and x e (A\C), showing that 

ipuQc (A\B) n (A\C); 

and (b) the proof that if x e (A\B) and x e (A\C), then x e A\(B u C), 
showing that 

04\B) O (A\C) = ^\(B u C). 

Then by our alternative definition of the equality of two sets P and Q, 
whereby P = Q if i> c g and g c p, the result will follow. The details, 
which are not difficult, are left to the reader. The proof of the second stated 
result follows on similar lines. 

The theorem may be illustrated in general terms, and proved for sets 
which may be represented by points in a plane, by the use of Venn diagrams. 
The three diagrams appropriate to the first stated result are shown in Figs 
1-5 (a), (b), and (c), where the shaded regions represent the sets A\B, A\C, 
and A\(B u C), respectively. 

The reader will have noticed that it is a feature of basic set operations 



8 / INTRODUCTION TO SETS AND NUMBERS CH 1 





A\B A\C A\(B U Q 

(a) (b) (c) 

Fig. 1-5 Representation of De Morgan's theorem. 

that they essentially combine two sets to generate a third in an unambiguous 
manner. It is because of this simple property that operations such as union 
and intersection are called binary set operations, the term 'binary' referring 
to the two sets on which the set operation is performed to generate the third. 
Thus the operation n acting on any two sets A and B generates a third set 
C = A n B where, of course, C will be the null set if A and B have no common 
elements. 

Theorem 1-1 illustrates that operations on sets are not always as simple 
as the formation of the union or intersection of sets. Accordingly, it is neces- 
sary to appreciate clearly the implication of any statement that may be made 
in the derivation of a result. These statements may either be 'one way' implica- 
tions or 'two way' implications in the following sense. An implication will 
be said to be one way if it is a simple statement of the form 'result A implies 
result B\ This statement is usually written symbolically in the concise form 

A=> B. 

A two way implication arises if from the above statement it also follows 
that 'result B implies result A\ so that in addition to the previous statement 
it is also permissible to write 

B=> A. 

Rather than write for a two way implication the two results A => B and 
B => A, the notation is contracted so that the two way implication may be 
written concisely in the form 



The symbol <*■ is usually read 'implies and is implied by'. 

Two simple illustrations using sets of integers should clarify these remarks. 
We can only write 

a = 1 => a is an integer, 

since the converse statement, a is an integer, does not imply that a = 1. 



SEC 1-2 SET THEORY AND PROBABILITY / 9 

However, we may obviously write 

integer n contains a factor 2 «- n is an even integer. 

Formal development of these and similar ideas is essential if the logical 
structure of mathematicsMS to be fully appreciated, though these matters 
will not be pursued further in this introductory account. 

1 -2 Set theory and probability 

One of the most direct applications of the elements of set theory is to be 
found in a formal introduction to probability theory. Because the notion of 
a probability is fundamental to many branches of engineering and science we 
choose to introduce some basic ideas and definitions now, making full use 
of the notions of set theory. This will serve a dual purpose in that it will 
provide an excellent illustration of a specific application of set theory, whilst 
at the same time introducing an important concept at the very outset of our 
study. 

In some situations the outcome of an experiment is not determinate, so 
one of several possible events may occur. Following statistical practice we 
shall refer to an individual event of this kind as the result or outcome of a 
trial, whereas an agreed number of trials, say N, will be said to constitute an 
experiment. If an experiment comprises throwing a die N times, then a trial 
would involve throwing it once and the outcome of a trial would be the score 
that was recorded as a result of the throw. The experiment would involve 
recording the outcome of each of the N trials. 

In general, if a trial has m outcomes we shall denote them by £i, E%, . . ., 
E m and refer to each as a simple event. Hence a trial involving tossing a coin 
would have only two simple events as outcomes: namely 'heads', which 
could be labelled E\, and 'tails', which would then be labelled £2. In this 
instance an experiment would be a record of the outcomes from a given 
number of such trials. A typical record of an experiment involving tossing a 
coin eight times would be £i, £2, E\, £i, £i, E 2 , £2, £1. With such a simple 
experiment the £1, £2 notation has no apparent advantage over writing H in 
place of £1 and T in place of £2 to obtain the equivalent record H, T, H, H, 
H, T, T, H. The advantage of the £» notation accrues from the fact that the 
subscript attached to the £ may be ordered numerically, thereby enabling 
easier manipulation of the outcomes during analysis. 

Events such as the result of tossing a coin or throwing a die are called 
chance or random events, since they are indeterminate and are supposedly 
the consequence of unbiased chance effects. Experience suggests that the 
relative frequency of occurrence of each such event averaged over a series of 
similar experiments tends to a definite value as the number of experiments 
increases. 

The relative frequency of occurrence of the simple event Ei in a series of 
N trials is thus given by the expression 



10/ INTRODUCTION TO SETS AND NUMBERS CH 1 

Number of occurrences of event Ei 

N 

By virtue of its definition, this ratio must either be positive and less than unity, 
or be zero. For any given N, this ratio provides an estimate of the theoretical 
ratio that would have been obtained were N to have been made arbitrarily 
large. This theoretical ratio will be called the probability of occurrence of 
event Ei and will be written P(Et). In many simple situations its value may be 
arrived at by making reasonable postulates concerning the mechanisms 
involved in a trial. Thus when fairly tossing an unbiased coin it would be 
reasonable to suppose that over a large number of trials the number of 
'heads' would closely approximate the number of 'tails' so that P(H) = P(T) 
= \. Here, of course, P(H) signifies the probability of occurrence of a 'head' 
and P(T) signifies the probability of occurrence of a 'tail'. 

If there are m outcomes E\, Ez, . . ., E m of a trial, and they occur with 
the respective frequencies m, m, . . ., n m in a series of JV trials, then we have 
the obvious identity 

m + «2 + • • • + n m _ 

N ~ 

When N becomes arbitrarily large we may interpret each of the relative 
frequency ratios mjN (i = 1, 2, . . ., m) occurring on the left-hand side as 
the probability of occurrence P(Ei) of event Ei, thereby giving rise to the 
general result 

P(E{) + P(E 2 ) + ■ ■ ■ + P(E m ) = 1. (1-8) 

By this time a careful reader will have noticed that the definition of 
probability adopted here has a logical difficulty associated with it, namely, 
the question whether a relative frequency ratio such as m/N can be said to 
approach a definite number as N becomes arbitrarily large. We shall not 
attempt to discuss this philosophical point more fully, but rather be content 
that our simple definition in terms of the relative frequency ratio is in accord 
with everyday experience. 

An examination of Eqn (1-8) and its associated relative frequency ratios 
is instructive. It shows the obvious results that: 

(a) if event Ei never occurs, then m = and P(E t ) = 0; 

(b) if event Ei is certain to occur, then ni = N and P{Ei) = 1 ; 

(c) if event Ei occurs less frequently than event £), then rn < nj and 
P{Ei)< P(.Ei); 

(d) if the m possible events E\, £2, . . ., E m occur with equal frequency, 
then m = « 2 = • • ' = n m = Njm and P(Ei) = P(E 2 ) = • ■ • = P(E m ) 
= \jm. 

The relationship between sets and probability begins to emerge once it is 



SEC 1-2 



SET THEORY AND PROBABILITY / 11 



appreciated that a trial having m different outcomes is simply a rule by which 
an event may be classified unambiguously as belonging to one of m different 
sets. Often a geometrical analogy may be used to advantage when representing 
the different outcomes of a particular trial and such an approach then leads 
directly to a representation closely approximating the Venn diagrams of the 
previous section. 

A convenient example is provided by the simple experiment which involves 
throwing two dice and recording their individual scores. There will be in 
all 36 possible outcomes which may be recorded as the ordered number 
pairs (1, 1), (1, 2), (1, 3), . . ., (2, 1), (2, 2), . . ., (6, 5), (6, 6). Here the first 
integer in the ordered number pair represents the score on die 1 and the second 
the score on die 2. These may be plotted as 36 points with integer coordinates 
as shown in Fig. 1-6 (a). 



6 

5 

•3 4 

a 
o 

u 

o 3 
o 
tw 

2 
1 



• • • 

• • •: 

• • • 

• • •: 



T3 

C 
O 

£ 
o 
u 




2 3 4 5 

Score on die 1 
(a) 



3 4 5 

Score on die I 
(b) 



Fig. 1-6 Sample space for two dice: (a) complete sample space; (b) sample space 
for specific outcome. 



Because each of the indicated points in Fig. 1 -6 (a) lies in a two- 
dimensional geometrical space (that is, they are specific points in a plane), 
and in their totality they describe all possible outcomes, the representation is 
usually called the sample space of events. The probability of occurrence of 
an event characterized by a point in the sample space is, of course, the 
probability of occurrence of the simple event it represents. 

As a sample space will require a 'dimension' for each of its variables it 
is immediately apparent that only in simple cases can it be represented 
graphically. Nevertheless the idea is still useful, as was that of the Venn 
diagram even when it was only symbolic. 

The points in the sample space may be regarded as defining points in a 



12 / INTRODUCTION TO SETS AND NUMBERS CH 1 

set D so that specific requirements as to the outcome of a trial will define a 
subset A of D, at each point of which the required event will occur. Typical 
of this situation would be the case in which a simple event is the throw of two 
dice, and the requirement defining the subset is that the combined score after 
throwing the two dice equals or exceeds 8. Here the set D would be the 36 
points within the square in Fig. 1-6 (b) and the set A the 15 points within 
the triangle. Using set notation we may write A c D. 

The sample space representation becomes particularly valuable when 
trials are considered whose outcome depends on the combination of events 
belonging to two different subsets A and B of the sample space. Thus, again 
using our previous example and taking for A the points within the triangle in 
Fig. 1-6 (b), the points in B might be determined by the requirement that the 
combined score be divisible by the integer 3. The set of points B is then those 
contained within the dotted curves of Fig. 1-6 (b). 

A new set C may be derived from two sets A and B in two essentially 
different ways according as : 

(a) C contains points in A or B or both; 

(b) C contains points in A and B. 

If desired, these statements about sets may be rewritten as statements 
about events. This is so because there is an unambiguous relationship between 
an event and the set of points Sin the sample space at which that event occurs. 
Thus, for example, we may paraphrase the first statement by saying, the event 
corresponding to points in C denotes the occurrence of the events corresponding 
to points in A or B, or both. Because of this relationship it is often convenient 
to regard an event and the subset of points it defines in the sample space as 
being synonymous. 

The statements provide yet another connection with set theory, since in 
(a) we may obviously write C = A u B, whereas in (b) we must write 
C = A n B. In terms of the sets A and B defined in connection with Fig. 
1-6 (b), the set C = A u B contains the points in the triangle together with 
those within the two dotted curves exterior to the triangle. The set C = A n B 
contains only the five points within the two dotted curves lying inside the 
triangle. 

Here it should be remarked that the statistician usually avoids the set 
theory symbols u and n, preferring instead to denote the union of A and B 
by A + B and their intersection by AB. This largely arises because of the 
duality we have already mentioned that exists between an event and the set 
of points it defines; the statistician naturally preferring to think in terms of 
events rather than sets. However, to emphasize the connection with set 
theory we shall preserve the set theory notation. 

Using this duality we now denote by P(A) the probability that an event 
corresponding to a point in the sample space lies within subset A, and define 
its value to be as follows: 



SEC 1-2 SETTHEORY AND PROBABILITY / 13 



DEFINITION 11 

P(A) is the sum of the probabilities associated with every point belonging 
to the subset A. 

In Fig. 1-6 (b) the set A contains the 15 points within the triangle and, 
since for unbiased dice each point in the sample space is equally probable, 
it follows at once that the probability 1/36 is to be associated with each of 
these points. Hence from our definition we see that in this case, P(A) 
= 15 x (1/36) = 5/12. Similarly, for the set B comprising the 12 points con- 
tained within the dotted curves we have P(B) = 12 x (1/36) = 1/3. 

We can now introduce the idea of a conditional probability through the 
following definition. 

definition 1-2 

P(A\B) is the conditional probability that an event known to be associated 
with set B is also associated with set A. 

Clearly we are only interested in the relationship that exists between A and 
B, with B now playing the part of a sample space. Because in Definition 1 -2 
B plays the part of a sample space, but is itself only a subset of the complete 
sample space, it is sometimes given the name of the reduced sample space. 
In terms of set theory Definition 1-2 is easily seen to be equivalent to 

P(A n B) 
P(A\B) = -^p (1-9) 

which immediately shows us how P(A\B) may be computed. Namely, 
P(A \B)is obtained by dividing the sum of the probabilities at points belonging 
to the intersection A n B of sets A and B by the sum of the probabilities at 
points belonging to B. This ensures that P(B\B) = 1 as would be expected. 

We can illustrate this by again appealing to the sets A and B defined in 
connection with Fig. 1-6 (b). It has already been established that P(B) = 1/3, 
and since there are only five points in A n B, each with a probability 1/36, 
it follows that P(A n B) = 5/36. Hence P(A\B) = (5/36)/(l/3) = 5/12. This 
result expressed in words states that when two dice are thrown and their 
score is divisible by the integer 3. then the probability that it also equals or 
exceeds 8 is 5/12. 

A direct consequence of Eqn (1-9) is the so called probability multiplication 
rule : 



theorem 1-2 If two events define subsets A and B of a sample space, then 

P(A n B) = P(B)P(A\B). 

Sometimes, when it is given that the event corresponding to points in 
subset B occurs, it is also true that P(A\B) depends only on A, so that 



14 / INTRODUCTION TO SETS AND NUMBERS CH 1 

P(A\B) = P(A). The events giving rise to subsets A and B will then be said 
to be independent. The probability multiplication rule then simplifies in an 
obvious manner which we express as follows: 

Corollary 1 -2 If the events giving rise to subsets A and B of a sample space 
are independent, then 

P(A nB) = P(A)P(B). 

Consideration of the interpretation of P(A u B) leads to another impor- 
tant result known as the probability addition rule: 

theorem 1-3 If two events define subsets A and B of a sample space, then 

P(A u B) = P(A) + P(E) - P(A n E). 

The proof of this theorem is self-evident once it is remarked that when 
computing P(A) and P(B) from subsets A and B and then forming the expres- 
sion P(A) + P{B), the sum of probabilities at points in the intersection 
A n Bis counted twice. Hence P(A) + P(B) exceeds P(A u B) by an amount 
P(A n B). 

The probability addition rule also has an important special case when 
sets A and B are disjoint so that A n B = <f>. When this occurs the events 
corresponding to sets A and B are said to be mutually exclusive and we express 
the result as follows : 

Corollary 1-3 If the events giving rise to subsets A and B of a sample space 
are mutually exclusive, then 

P(A UB)= P(A) + P(B). 

As a simple illustration of Theorem 1 -3 we again use the sets A and B 
defined in connection with Fig. 1 -6 (b) to compute P(A u B). The result is 
immediate for we have already obtained the results P(A) = 5/12, P(B) = 1/3, 
and P(A n B) = 5/36, so from Theorem 1 -3 follows the result 

P(A UB) = 5/12 + 1/3 - 5/36 = 11/18. 

The applications of these theorems and their corollaries are well illustrated 
by the following simple examples. 

Example 1-1 A bag contains a very large number of red and black balls in 
the ratio 1 red ball to 4 black. If 2 balls are drawn successively from the bag 
at random, what is the probability of selecting 

(a) 2 red balls, 

(b) 2 black balls, 

(c) 1 red and 1 blackball? 



SEC .1-2 SET THEORY AND PROBABILITY / 15 

Let A\ denote the selection of a red ball first (and either colour second), 
and Az the selection of a red ball second (and either colour first). Then 
Ai n A2 is the selection of 2 red balls and, similarly, Bi n B 2 is the selection 
of 2 black balls. As the balls occur in the ratio 1 red : 4 black it follows that 
their relative frequency ratios are 1/5 for a red ball and 4/5 for a black ball, so 
P(Ai) = 1/5 and P(Bi) = 4/5. 

The fact that the bag contains a large number of balls implies that the 
drawing of one or more balls does not materially alter the relative frequency 
ratio that existed at the start, so P{A Z ) = 1/5 and P{B^) = 4/5. This, together 
with the fact that the balls are drawn at random, implies that the drawing of 
each ball is an independent event. The independence of events A and B then 
allows the use of Corollary 1 -2 to determine the required solutions to (a) and 
(b). We find that 

(a) P(Ai n A 2 ) = (1/5) . (1/5) = 1/25, 

(b) P(Bi n Bi) = (4/5) . (4/5) = 16/25. 

Now to answer (c) we notice that there are two mutually exclusive orders 
in which a red and a black ball may be selected. Namely as the event Cu D 
where C = A\ n B 2 (red then black) and D = B\C\ A* (black then red). 
From Corollary 1-3 we then have that P(C u D) = P(C) + P(D), where 
P(C) and P(D) are determined by Corollary 1-2. This shows that P(C) 
= P(Ai)P(B 2 ) and P(D) = P{Bi)P(A 2 ), so that P(C) = P(D) = (1/5) . (4/5) 
= 4/25. The solution to (c) becomes 

P(C u D) = 4/25 + 4/25 = 8/25. 

The three forms of selection (a), (b), and (c) are themselves mutually 
exclusive and it must follow that P(Ai n A 2 ) + P{Bi n B 2 ) + P(C u D) = 1, 
as is readily checked. Indeed this result could have been used directly to 
calculate P{C u D) from P(A\ n A z ) and P(Bi n B 2 ) in place of the above 
argument using Corollary 1-3. 

The previous situation becomes slightly more complicated if only a 
limited number of balls are contained in the bag. 

Example 1-2 A bag contains 50 balls of which 10 are red and the remainder 
black. If 2 balls are drawn successively from the bag at random, what is the 
probability of selecting 

(a) 2 red balls, 

(b) 2 black balls, 

(c) 1 red and 1 black ball ? 

This time the approach must be slightly different because, unlike Example 
1 ■ 1 , the removal of a ball from the bag now materially alters the probabilities 
involved when the next ball is drawn. In fact this is a problem involving 
conditional probabilities. 

Here we shall define A to be the event that the first ball selected is red, 



16 / INTRODUCTION TO SETS AND NUMBERS CH 1 

and B to be the event that the second ball selected is red. The probability we 
must now evaluate is the probability of occurrence of event B given that 
event A has occurred. Expressed in set notation we have to find P(A n B), 

the probability of occurrence of the event associated with A n B. This is a 
conditional probability with the set associated with event A playing the role 
of the reduced sample space. Utilizing this observation we now make use of 
Theorem 1-2 to write 

P(A nB) = P(A)P(B\A). 

Now the relative frequency of occurrence of a red ball at the first draw is 
10/(10 + 40) = 1/5, so that P(A) = 1/5. (Not till later will we use the fact 
that the relative frequency of occurrence of a black ball is 40/(10 + 40) 
= 4/5.) 

Given that a red ball has been drawn, 9 red balls and 40 black balls remain 
in the bag. If the next ball to be drawn is red then its probability of occurrence 
is the conditional probability P(B\A) = 9/(9 + 40) = 9/49. Hence it follows 
that the solution to (a) is 

P(A nB) = (1/5) . (9/49) = 9/245. 

It is interesting to compare this with the value 1/25 that was obtained in 
Example 1 • 1 on the assumption that there was virtually an infinite number of 
balls in the bag. 

If C is defined to be the event that the first ball drawn is black and D the 
event that the second ball drawn is black, then to answer (b) we must compute 
P(C n D). Obviously, P(C) = 4/5, and by using an argument analogous to 
that above it follows that P(D\C) = 39/(10 + 39) = 39/49. Hence the solu- 
tion to question (b) is 

P(C n Z>) = (4/5) . (39/49) = 156/245. 

Again this should be compared with the value 16/25 obtained in Example 
1-1. 

The simplest way to answer (c) is to use the fact that events (a), (b), and 
(c) describe the only possibilities and so are mutually exclusive. Hence the 
sum of the three probabilities must equal unity. Denoting the probability of 
event (c) by P we have 

P=l-P(/lnB)-P(Cn D), 
showing that P = 1 - 9/245 - 156/245 = 16/49. 

It is sometimes helpful to bear in mind the following table in which 
equivalent statements are expressed using the alternative languages of sets 
and probability theory. 



SEC 1-2 SETTHEORY AND PROBABILITY / 17 

Sets Probability 

Au B = C A + B = C; the event corresponding to C is denned as 

the occurrence of at least one of the events corres- 
ponding to A or B or both. 

A n B = C AB = C; the event corresponding to C is defined as 

the occurrence of both of the events corresponding to 
A and B. 

A n B = <f> AB = 0; events corresponding to A and B are mutually 

exclusive. 

A = <f> A = 0; the event corresponding to A does not occur. 

B <= A B => ^4 ; the event corresponding to B implies that 

corresponding to A. 

A\B the event corresponding to A and not that corres- 

ponding to B. 

To close this section with a brief examination of repeated trials, the ideas 
of a permutation and a combination must be utilized. The student will already 
be familiar with these concepts from elementary combinatorial algebra and 
so we shall only record two definitions. 

definition 1-3 A permutation of a set of n mutually distinguishable 
objects rata time is an arrangement, or an enumeration of the objects, in 
which their order of appearance counts. 

Thus of the five letters a, b, c, d, e the arrangements a, b, c and a, c, b 
represent two different permutations of three of the five letters. These are 
described as permutations of five letters taken three at a time. Other permuta- 
tions of this kind may be obtained by further re-arrangement of the letters 
a, b, c and by the replacement of any of them by either or both of the 
remaining two letters d and e. 

The total number of different permutations of n objects r at a time will be 
denoted by n P r and it is left to the reader to prove as an exercise that 

n Pr = -» (1-10) 

(n — r)\ v ' 

where n\ (factorial n) = n(n — 1)(« — 2) . . .3.2.1, and we adopt the 
convention that 0! = 1. 

definition 1-4 A combination of a set of n mutually distinguishable 
objects r at a time is a selection of r objects from the n without regard to 
their order of arrangement. 



18 / INTRODUCTION TO SETS AND NUMBERS CH 1 

It follows from the definition of a permutation that a set of r objects may 
be arranged in r! different ways so that denoting the number of different 

combinations of n objects r at a time by I J, we must have 



•*r«rt(" r ). 



This gives the important result 

l«\ = /' ■ (Ml) 

\r) r\(n-r)\ 

In many books it will often be found that the expression n C r is written in 

place of I I. The numbers I J are usually called binomial coefficients 

because of their occurrence in the binomial expansion 

(p + q)n — jr \\ p r q n ~ r , with n a positive integer. (112) 

r = \ r J 

Now consider an experiment involving a series of independent trials in 
each of which only one of two events A or B may occur. Then if the prob- 
abilities of occurrence of events A and B are p and q, respectively, we must 
obviously have/7 + q = 1 . If n such trials constitutes an experiment, we might 
wish to know with what probability the experiment may be expected to yield 
r events of type A. The statistician will call such a situation repeated inde- 
pendent trials. 

An experiment will be deemed to be successful if r events of type A and 
n — r events of type B occur, irrespective of their order of occurrence. 

Clearly this can happen in I J different ways and by Corollary 1-2, since 

the trials are independent, the probability of occurrence of any one of these 
events will be/J r (l — p) n ~ r . Hence, as the results of trials are also mutually 
exclusive, it follows from Corollary 1 -3 that the required probability P(r) of 
occurrence of r events of A each with probability of occurrence p in n 
independent trials is 

JV) =(")/> r -/>)"-'• (113) 

Identifying the p and q of Eqn (M2) with the probabilities of occurrence 
of the events A and fijust discussed, we see that<7 = 1 — p, so that Eqn (112) 
takes the form 

i = i (") /,r(1 ~ / ' ) "" r - (H4) 

Each term on the right-hand side of Eqn (M4) then represents the probability 



SEC 1-2 



SET THEORY AND PROBABILITY / 19 



of occurrence of an event of the form just discussed. For example, the first 
term 



P(0) 



-(;)<'-»■ 



is the probability that event A will never occur in a series of n independent 
trials, whilst the third term 



P(2) = fy p\\ - p) 



2(\ _ „\n-2 



is the probability that event A will occur exactly twice in a series of n indepen- 
dent trials. 

The n + 1 numbers P(r), r = 0, 1, . . ., n have, by definition, the property 
that 



P(0) +/>(!) + • • • +P(n)= 1, 



(1-15) 



and they are said to define a discrete probability distribution. It is conven- 
tional to plot them in histogram fashion when they illustrate the probabilities 
to be associated with the n + 1 possible outcomes of an experiment involving 
n trials. Fig. 1-7 (a) illustrates the case in which n = 4 and p = \ so that 



0-6 
0-5 
0-4 
0-3 
0-2 
01 




P{r) 



( n = 4,p = i) 



10 
0-8 
0-6 
0-4 
0-2 



12 3 



T> 



U(r) 



12 3 



5 



(a) (b) 

Kig. 1.7 Binomial distribution: (a) binomial probability density function; (b) 
binomial cumulative distribution function. 



™-G)G)'er -&«■>- 



27 



, = — , and, similarly, 
4/ 64 



P(2) = 54/256, P(3) = 3/64, and ^(4) = 1/256. Because of the origin of this 
distribution, Eqn (1-13) is said to define the binomial distribution. This 
distribution is historically associated with Jacob Bernoulli (1654-1705) and 
experiments of the type just examined are sometimes referred to as Bernoullian 
trials. When the cumulative total 



20 / INTRODUCTION TO SETS AND NUMBERS CH 1 



U(r)=JP(t), (1-16) 

«=o 

is plotted in histogram fashion against r the result is called the cumulative 
distribution function. The cumulative distribution function corresponding to 
Fig. 1-7 (a) is shown in Fig. 1-7 (b). It is conventional to refer to the P(r) 
as the probability density function or the frequency function since it describes 
the proportion of observations appropriate to the value of r. 

Example 1-3 If an unbiased coin is tossed six times, what is the probability 
that only two 'heads' will occur in the sequence of results ? 
As the coin is unbiased p = q = | and so 



«-w-s 



It is an immediate consequence of Eqn (1-13) that 

(a) if A occurs with probability p in independent trials then the prob- 
ability that it will occur at least r times in n trials is 



I (")/>»0 -/>)»-«; 



(b) if A occurs with probability p in independent trials then the probability 
that it will occur at most r times in n trials is 



iQp s (i-p) n - s ; 



and to this we may add Eqn (1-13) in this form: 

(c) if A occurs with probability p in independent trials then the prob- 
ability that it will occur exactly r times in n trials is 



("j^d -/')"-'• 



Example 1-4 What is the probability of hitting a target when three shells 
are fired, assuming each to have a probability \ of making a hit ? 

Obviously here p = \ and we will have satisfied the conditions of the 
question if at least one shell finds the target. Accordingly, using (a) above, 
the result is 

Hence the required probability is f + f + \ = \. 

So far the sample spaces we have used have involved discrete points, and 



SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 21 

it is for this reason that the term discrete has been used in conjunction with 
the definition of the binomial distribution. In other words, in discrete dis- 
tributions, no meaning is to be attributed to points that are intermediate 
between the discrete sample space points. In particular, referring to Fig. 
1 -6 (a), there is no score to be attributed to a point with horizontal co- 
ordinate 2-5 and vertical coordinate 4-2 any more than there is to the point 
with horizontal coordinate 1 1 and vertical coordinate 9. 

However, situations occur in which perfectly satisfactory sample spaces 
can be defined which associate an event with every point of the sample space, 
and not just certain discrete points. The definition of a distribution function 
appropriate to this case requires ideas from the calculus and will not be 
discussed here. In statistics such distributions are called continuous distri- 
butions. 

1 -3 Integers, rationals and arithmetic laws 

The reader will already be familiar with the fact that if the arithmetic opera- 
tion of addition is performed on the natural numbers, or the positive integers 
as they are often called, the result will also be a positive integer. Written 
symbolically this statement becomes a, b e N => (a + b) e N. However the 
arithmetic operation of subtraction is less simple, since we know from direct 
experience that even when a,beN, this does not necessarily imply that 
a — b is a positive integer. Indeed, in general a — b may be equal to some 
positive or negative integer or to zero. 

Thus an attempt always to express the result of subtraction of natural 
numbers in terms of the natural numbers themselves must fail. This is usually 
expressed by saying that the system of natural numbers N is not closed with 
respect to subtraction. The difficulty is of course resolved by supplementing 
the set of natural numbers N by the set N* = {. . ., —3, —2, —1,0} of 
negative integers and zero. If now in place of N we use the complete set of 
integers I = N* u N already encountered in Problem 1-1, the assertions 
a, b e I => {a + b) e I and a, b e I => (a — b) e I become unconditionally true. 

The need to generalize the notion of the natural numbers N to the com- 
plete set of integers I is thus seen to arise as a natural result of seeking a 
number system in which the binary arithmetic operation inverse to addition 
is always true; namely the operation of subtraction. However, the set of 
numbers I is still far from adequate to enable everyday practical arithmetic 
to be performed. To see this it is only necessary to comment that although the 
product of two integers belonging to I itself lies in I, the quotient of two 
integers belonging to I does not necessarily lie in I. Thus the complete set of 
integers I is not closed with respect to division. Symbolically we can write 
this as a, b e I => ab e I, but a, b e I => a/b e I only if b ^ and a = kb 
with k e I. The symbol =£ used here is to be read 'not equal to' and the condi- 
tion involving k simply ensures that the quotient ajb is integral. 

Here again the operations of multiplication and division are inverse 



22 / INTRODUCTION TO SETS AND NUMBERS CH 1 

binary arithmetic operations. To remove the artificial restriction on division, 
so that the quotient of any two non-zero integers becomes a number in some 
number system, we must still further extend the system I of integers. This is 
achieved by introducing the familiar system R* of rational numbers, which is 
defined as the set of all numbers of the form alb, where b ^ and a, b e I. 
Obviously, since integers are just a special case of rational numbers and, for 
example, 2 is represented by any of the rationals 2/1, 4/2, 10/5, . . ., the set 
R* also contains all the integers and so we may write I <= R* 



>* 



Numerous though the rational numbers obviously are, we now show how 
they may be arranged in a definite order and counted. One way in which this 
may be achieved is indicated in the following array which recognizes as 
different all rational representations in which cancelling of common factors 
has not been performed. Thus, for example, in this scheme 4/2, 6/3, 8/4, . . ., 
are counted as different rational numbers, despite the fact that they all 
represent 2. If desired these repetitions may be omitted from the resulting 
sequence of rational numbers, though the matter is not important. The 
counting or enumeration of the rationals proceeds in the order indicated by 
the arrows : 
















\ 




-1 


-1 


-l i i 


i l 


T 


~T^~ 


T" T"*2 


3^4 


t 


\ 


t I 


t I 


-2 


-2 


-2 2 2 


2 2 


T 


~2~ 


T"*~T*~2 


3 4 


t 


\ 




t i 


-3 


-3 


-3 3 3 


3 3 


~3~ 


~T~* 


T^^Y 


^3 4 



If this form of enumeration is adopted then the first few rationals to be 
specified are 

0, 1, 1/2, 1, 2, -2, -1, -1/2, -1, -3/2, -3, . . .. 

As already mentioned, if desired the repetitions may be deleted, so that the 
start of the sequence would then become 

0, 1, 1/2, 2, -2, -1, -1/2, -3/2, -3, . . .. 

Clearly all rationals are included somewhere in this scheme, so that as 
each one may be put into correspondence with an integer, the mathematician 



SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 23 

is entitled to say that the rationals are countable, despite the fact that they are 
infinite in number. What this construction has established is the rather 
remarkable result that the rationals are no more numerous than the set of 
positive integers themselves. 

It might, at first sight, seem that the rationals R* must contain all possible 
numbers. In fact this is far from the truth since it is possible to show that 
numbers exist which are not expressible as a rational fraction and yet which 
lie between two rationals, however close they may be. For obvious reasons 
they are called irrational numbers, and to substantiate our assertion we now 
prove the existence of one such number. 

We will show that \/2 is irrational or, to phrase the statement more 
precisely, that there is no fraction of which the square is 2. The argument 
starts from a given assumption and then produces a contradiction, thereby 
showing that the original assumption must be false. It is called an argument 
by contradiction and is a device frequently used in higher mathematics. 

Suppose that mjn is such that m and n are integers having no common 
factor and (mjn) 2 = 2. Then m 2 = 2n 2 so that m 2 must be even and hence 
m itself is even. Because m is even we may set m = 2r, where r is some integer. 
(Why ?) Then Ar 2 = 2n 2 , or 2r 2 = n 2 , which now shows that n 2 and hence n 
must be even. The fact that n is even now allows us to set n = 2s and thus the 
numbers m and n have a common factor 2, contradicting the initial assump- 
tion. Hence the original assumption that \/2 is capable of representation in 
the rational form mjn is false. We have thus proved that -y/2 is an irrational 
number. 

It is established in higher mathematics courses that the irrational numbers 
are so much more numerous than the rationals that they cannot be enumerated. 
We make no attempt to justify this claim here. Instead we refer the interested 
reader to Problems 1-32 to 1-35 if he wishes to gain a little more insight into 
the relationship that exists between the rationals and the irrationals. A final 
important result arising from a deeper study of these matters, and to which 
we make only a passing reference, is the fact that between them, the rational 
and the irrational numbers exhaust all the possible types of numbers. In 
effect this is saying that if we work with real rational and irrational numbers, 
then there are no gaps left in the number system that can only be filled by the 
introduction of yet another kind of number. This is important because it means 
that however we may arrive at a number, as the result of a finite or an infinite 
sequence of operations, it will either be a rational or an irrational number. 

If the set R* of rational numbers is supplemented by the inclusion of the 
irrational numbers, the resulting set R is called the real number system or, 
the field of real numbers. The fact that R contains all possible types of real 
numbers is expressed by saying that the set of real numbers R is complete. 
Consequently, until we have occasion to consider entities such as y'—l 



24 / INTRODUCTION TO SETS AND NUMBERS CH 1 

there will be no need for us to work outside the real number system R. 

Numbers called transcendental numbers form an important subset of [he 
irrational numbers. These are numbers like e and it which are not defined as 
the root of a polynomial with rational coefficients (cf. § 2-3). 

For future reference it will be useful to summarize the basic properties 
of the field of real numbers already known to the reader. We now do this 
making full use of the mathematical shorthand so far introduced. 

Additive properties 

A-l a, b e R => (a + b) e R; R is closed with respect to addition. 
A-2 a, b eR => a + b = b + a; addition is commutative. 
A-3 a, b, c e R => (a + b) + c = a + (b + c) ; addition is associative. 
A-4 For every aeR there exists a number e R such that + a = a ; 

there is a zero element in R. 
A- 5 If a e R then there exists a number — a e R such that — a + a — ; 

each number has a negative. 
Multiplicative properties 

M-l a, b e R => ab e R; R is closed with respect to multiplication. 

M-2 a,be'R=>ab = ba; multiplication is commutative. 

M-3 a, b, c e R => (ab)c = a(bc); multiplication is associative. 

M-4 There exists a number 1 e R such that 1 . a = a for all a e R; 

there is a unit element in R. 
M-5 Let a be a non-zero number in R, then there exists a number a -1 e R 

such that a~ l a = 1 ; each non-zero number has an inverse. Usually 

we shall write Ija in place of a -1 , so that the two expressions are 

to be taken as being synonymous. 
Distributive property 

Dl a, b, c e R => a(b + c) = ab + ac; multiplication is distributive. 

The above results are self-evident for real numbers and are usually called 
the real number axioms. They are used by mathematicians as the logical basis 
for our number system. Later we shall encounter other systems of objects 
which, though sharing many of the properties of real numbers, are not them- 
selves numbers. For future reference we mention matrices, for which M-l to 
M-5 are not generally true, and vectors, for which two forms of multiplication 
exist and for which M-5 has no meaning. 

It is an immediate consequence of these axioms that commonplace 
arithmetic operations may be performed without question. For example, it is 
fundamental to arguments that a — b = Oo a = b, and a£ = arj => f = r\ 
if a ^ 0. These, and other elementary results of similar form, follow directly 
as a result of simple applications of the axioms. As it would be out of place 
to develop these ideas here we shall indicate the proof of just one such result, 
stating the others in the form of Problem 1-37 which is left to be attempted by 



SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 25 

the reader who wishes to question further the basis of the real number system. 

We prove that there is a unique zero in R. The argument is again by con- 
tradiction, for we first suppose that two different zero elements exist and 
denote them by and 0'. Then by A-4 it follows that + 0' = 0' and 0' + 
= 0, whence by A-2 we must have = 0', thereby establishing the uniqueness 
of the zero. 

So far our list of properties of real numbers has been concerned only with 
equalities. The valuable property of real numbers that they can be arranged 
according to size, or ordered, has so far been overlooked. It is of course this 
property that allows us to represent real numbers by points on a line and 
thereby to construct graphs and other valuable geometrical representations. 
Ordering is achieved by utilizing the concept 'greater than' which when used 
in the form 'a greater than b\ is denoted by a > b. Hence to the other real 
number axioms must be added: 

Order properties 

OT If a e R then exactly one of the following is true; either a > or 
a = or —a > 0. 

0-2 a, b e R, a > 0, b > => a + b > 0, and ab > 0. 

We now define a > b and a < b, the latter being read 'a less than b\ by 
a> b => a — b > and a < b => b — a > 0. The following results are 
obvious consequences of the real number system and are called inequalities. 
In places they also involve the symbol > which is to be read 'greater than or 
equal to'. 

Elementary inequalities in R 

IT a > b and c>d=>a + c>b + d. 

1-2 a > b > and c >d> => ac> bd. 

1-3 k > and a> b => ka> kb. 

1-4 a > b => — a < —b. 

1-5 a < 0, b > => ab < 0; a < 0, b < => ab > 0. 

1-6 a > => a- 1 > 0; a < 0j=> a 1 < 0. 

1-7 a > b > => b' 1 > a- 1 > 0; a < b < => b' 1 < a' 1 < 0. 

An important use of inequalities is in defining intervals on a line and 
regions in a plane. Using the order property of numbers to associate numbers 
with points on a line, an interval on a line may be considered to be a segment 
of the line between two given points or numbers, a and b, say. Three cases 
arise according as to whether (a) both end points are included in the interval, 



26 / INTRODUCTION TO SETS AND NUMBERS 



CH 1 



a b 

a < x < b 



a < x < b 



a < x <b 



(a) (b) (c) 

Fig. 1-8 Intervals on a line: (a) closed interval a < x < b; (b) open interval 
a < x < b; (c) semi-open interval a < x < b. 

(b) both end points are excluded from the interval, or (c) one is included and 
one is excluded. These are called, respectively, (a) a closed interval, (b) an 
open interval, (c) a semi-open interval. Namely, an interval is closed at an 
end which contains the end point, otherwise it is open at that end. In terms of 
the points a and b and the variable x representing an arbitrary point on the 
line these are written : 

(a) a < x < b ; closed interval ; 

(b) a < x < b ; open interval ; 

(c) a < x < b or a <i x < b; semi-open interval. 

Thus 1 < x < 2 defines the semi-open interval containing the point x = 1 
and the points up to, but not including, x — 2. These are represented in 
Fig. 1-8 in which a solid line represents points in the interval, a circle represents 
an excluded point, and a dot an included point. 

Special cases occur when one or both of the end points of the interval 
are at infinity. The intervals — oo < x < a and b < x < oo are called semi- 
infinite intervals and — co < x < co is an unbounded interval or, more 
simply, the complete real line. 

We illustrate the corresponding definition of a region in the (x, j)-plane 
by considering the three inequalities x 2 + y 2 < a 2 , y < x, x > 0. The first 
defines the interior of a circle of radius a centred on the origin, the second 
defines points below, but not on, the straight line y = x, and the third defines 
points in the right half of the (x, j;)-plane including the points on the j-axis 




(a) (b) 

Fig. 1-9 Regions in plane: (a) region boundaries x 2 + y 1 = a 2 , y = x, and x = 0; 
(b) region x 2 + y 2 < a 1 , y < x, x > 0. 



SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 27 

itself. These curves represent boundaries of the regions in question and the 
boundary points are only to be included in the region when possible equality 
is indicated by use of the signs > or <. The three regions are indicated in 
Fig. 1-9 (a) in which a full line indicates that points on it are to be included, a 
dotted line indicates that points on it are to be excluded, and shading indicates 
the side of the line on which the region in question must lie. Fig. l-9(b) 
indicates the region in which all the inequalities are satisfied. 

Simple inequalities of the form (x. + l)(x + 3) > (x — l)(x — 2) also 
define intervals. For, clearing the brackets, we 'see that x 2 + 4x + 3 
> x 2 — 3x + 2 which, by simple application of the elementary inequalities 
just listed, reduces to x > —1/7 defining a semi-infinite interval, open at the 
end x = —1/7. 

The elementary inequalities may often be used to advantage to simplify 
complicated algebraic expressions by yielding helpful qualitative information 
as the following example indicates. 

Example 1-5 Prove that if ai, a 2 , . . ., a n and b\, b 2 , . . ., b n are positive 
real numbers, then 



mm 

1 < r < n 



(a r \ ^ ai + a 2 + • ■ • + a n la r \ 

— I < t : — < max — • 

\br! 01 + 02 + • • • + b n l < r <n \b r J 



Here the left-hand side of the inequality is to be interpreted as meaning 
the minimum value of the expression (a r /br), with r assuming any of the 
integral values between 1 and n and the right-hand side is to be similarly 
interpreted reading maximum in place of minimum. The result follows by 
noticing that 



a\ + az + • • • + an _ 1 
bx + 02 + ■ ■ • + b n ~^T7 

2, t>r 



® + -® + --- + md: 



where ^,b r = bi + b 2 + ■ • • + b n . For if each of the expressions (ai/oi), 

r=l 

(a 2 jb 2 ), . . ., {dnjbn) is replaced by the smallest of these ratios, which could 
be the value taken by all the expressions if ai = a 2 = ■ • • = a n > and 
0i = 02 = • • • = b n > 0, then 



ai + a-i+ ■ ■ ■ + a n 
0i + 02 + • • • + 



» . (a r \ 

~- mm 77 

n < r < n \0r/ 

= min (£), 

< r < n \Or/ 



"(01 + 02 + • • • + b n )' 



I b r 

r=l 



which is the left half of the inequality. The right half follows by identical 
reasoning if maximum is written in place of minimum. 



28 / INTRODUCTION TO SETS AND NUMBERS CH 1 

1 -4 Absolute value of a real number 

definition 1-5 The absolute value \a\ of the real number a provides a 
measure of its size without regard to sign, and is defined as follows: 

. . [a when a > 
\o\ = { 

[—a when a < 0. 

Thus if a = 3, then \a\ = 3 and if a = — 56 then \a\ = 5-6. 
There are three immediate consequences of this definition which we now 
enumerate as 

theorem 1-4 If a, A e R then 

(a) \ab\ = \a\ \b\, 

(b) \a + b\<\a\ + \b\, 

(c) \a - b\ > \\a\ — \b\\. 

The proof is simply a matter of enumerating the possible combinations 
of positive and negative a and b, and then making a direct application of the 
definition of the absolute value. We shall only illustrate the proof of (a). 

There are three cases to be considered ; firstly a > 0, b > 0, secondly 
a > 0, b < 0, and thirdly a < 0, b > 0. If a > 0, b > then ab > and so 
\ab\ = ab = \a\ \b\. The second and third situations are essentially similar 
so we shall discuss only the second. As a > 0, b < we have ab < 0, whence 
\ab\ — —ab = a(—b) = \a\ \b\, establishing (a). For reasons we give later, 
result (b) is usually called the triangle inequality. 

The absolute value may also be used to define intervals since an expression 
of the form \a — x\ > 2 implies two inequalities according as a — x is 
positive or negative. If a — x > then \a — x\ = a — x and we have 
a — x>2or x<a — 2. However if a — x < 0, then by the definition of the 
absolute value of a — x we must have \a — x\ = —(a — x) showing that 
— (a — x) > 2, or, x > a + 2. Taken together the results require that x 
may be equal to or greater than 2 + a or equal to or less than a — 2. x may 
not lie in the intervening interval of length 4 between x = a — 2 and x = a + 2. 
This is illustrated in Fig. 1-10 (a) where a solid line is again used to indicate 
points in the interval satisfied by \a — x\ > 2 and the dots are to be included 
in the appropriate intervals. 

By exactly similar reasoning we see that if we consider the inequality 
1 < |* + 1| < 2, then if x + 1 > 0, |.v + 1| = x + 1 and the inequality 
becomes 1 < x + 1 < 2. Hence the interval is < jc< 1. However, if 
x + 1 < 0, then \x + 1| = — x — 1 and so the inequality becomes 
1 < — jc — 1 < 2 giving rise to the interval — 3 < x < —2. These intervals 
are shown in Fig. 1-10 (b) with circles indicating points excluded from the 
end of the solid line intervals and dots indicating points to be included. 



SEC 1-5 REPRESENTATION OF NUMBERS / 29 

a+2 0-2 -3 -2 

\a-x\>2 l<|;t+l|<2 

(a) (b) 

Fig. 110 Intervals on a line: (a) \a — x] > 2; (b) I < \x + l| < 2. 

1-5 Representation of numbers 

The decimal representation of real numbers is usual in all ordinary arithmetic 
work and involves expressing a real number as the sum of an integral part 
and a decimal fraction. Each of the parts is represented as the sum of multiples 
of powers of 10, with the powers being positive integers or zero when repre- 
senting the integral part and negative integers when representing the decimal 
part. The number 10 that forms the basis of the decimal system is called the 
base of the number system. 

The integral part r of a finite real number a is thus expressible as 

r = fl„(10«) + tfn-iOO"- 1 ) + • • • + aiOO 1 ) + a (10°), 

where n is suitably chosen, and the coefficients a% are either zero or an integer 
between 1 and 9. Hence, in reality, the number 2049 is a convenient representa- 
tion of 2(10 3 ) + 0(10 2 ) + 4(10!) + 9(10°), with the positions of the digits 
indicating the positive powers of 10 by which they are to be multiplied before 
addition. 

Similarly, if the decimal fraction part d of a real number a terminates 
after n decimal places, then it is expressible in the form 



b\ b% 


b n 


-„ + ■ ■ 


' ' + 77T 


10 10 2 


10" 



with the coefficients b] again being either zero or an integer between 1 and 9. 
Hence the decimal number 0-3012 is, in reality, the representation of 

3 JL _L — 
10 + "HP + UP + 10 4 ' 

with the positions of the digits indicating the negative powers of 10 by 
which they are to be multiplied before addition. 

In general then, the decimal number that is written 

a m am-i ■ ■ ■ fli«o b\b2 . . ■ b n 



(m + 1) digits n digits 
and which terminates after n decimal places, is the representation of 
M10 m ) + flm-iOO™- 1 ) + • • • + aiClO 1 ) 



30 / INTRODUCTION TO SETS AND NUMBERS CH 1 

Consideration of the representation of non-terminating decimal fractions 
and irrational numbers will be postponed until we discuss sequences and 
limits, since the approximation of real numbers by rationals has not yet been 
discussed. 

There is no reason why the base of the number system should not be any 
integer N > 1 and, indeed, in digital computing extensive use is made of the 
binary system. This is the system of representation using the base 2. Hence a 
binary number will contain only the digits 1 and with their position indi- 
cating the power of 2 involved. Thus we may write 

11 = 1(23) + 0(22) + i( 2 i) + i(20) 

so that the binary representation of 1 1 is 101 1. Similarly, the rational number 
9/16 may be written 

9 _ 1 1 

Yi>~2 + 2 2 + 2 z + 2 i ' 

showing that its binary fraction form is 0-1001. Hence the binary form of the 
number live becomes 101 11001 and, as in the case of decimals, the position of 
a digit relative to the binary point indicates the power of two by which it is 
to be multiplied before addition. 

It is easily verified that the addition and multiplication tables for binary 
numbers are as illustrated in the following two tables: 



Binary Binary 

addition multiplication 



+ 


1 




1 


1 

1 



X 


1 




1 



1 



Both tables are entered by selecting one digit in the first column and one 
in the first row, when the result of the operation appropriate to the table, 
namely addition or multiplication, is shown in the body of the table. For 
example, using the addition table and taking the digits 1 in the first column 
and in the first row we see that 1+0=1. Similarly, taking the digits 
1 in the first column and 1 in the first row we see that 1 + 1=0. The inter- 
pretation of this latter result is, of course, that a digit 1 must be transferred to 
the next higher power of 2, corresponding to the transference of multiples of 
powers of 10 when performing ordinary addition. The multiplication table is 
straightforward and needs no further comment. 

The examples that now follow illustrate the addition, multiplication, and 



SEC 1-6 MATHEMATICAL INDUCTION / 31 

subtraction of simple binary numbers. We shall let a = 12, b = 11 and form 
a + b, ab, and a — b using binary notation. The binary representations of a 
and b are a = 1 100, b = 1011 and so we have: 



Addition 

110 
10 11 


+ 


Multiplication 

1 1 x 
10 11 


li 1 1 1 


1 10 

110 

110 






10 10 



Here the subscript 1 has been used to indicate the transference of a digit 1 
corresponding to the result 1 + 1=0. 

The subtraction a — b is equally straightforward provided it is recalled 
that when the subtraction of digits — 1 is encountered, it is necessary to 
'borrow' a digit 1 from the next higher position in the number b. Thus the 
result would be to write 1 in place of — 1 and to add 1 to the next higher 
position in b. 

Subtraction 

110 0- 
10 11 



1 



The expressions a + b, ab, and a — b for a = 12, b = 11 are thus 
a + b = 1(24) + o(23) + 1(2 2) + i( 2 i) + 1(2 o) = 23, 
ab = 1(26) + o(25) + o(24) + o( 2 3) + 1(2 2) + (2i) + 0(2°) = 132, 
a - b = 0(23) + 0(22) + ( 2 i) + 1(20) = L 

1 -6 Mathematical induction 

Mathematical propositions often involve some fixed integer n, say, in a special 
role and it is desirable to infer the form taken by the proposition for arbitrary 
integral n from the form taken by it for the specific value n = m. The logical 
method by which the proof of the general proposition, if true, may be estab- 
lished, is based on the properties of natural numbers and is called mathematical 
induction. 

In brief, it depends for its success on the obvious fact that if A is some set 



32 / INTRODUCTION TO SETS AND NUMBERS CH 1 

of natural numbers and 1 e A, then the statement that whenever integer 
ne A, so also does its successor, implies that A = N, the set of natural 
numbers. 

The formal statement of the process of mathematical induction is expressed 
by the following theorem where, for simplicity, the mathematical proposition 
corresponding to integer n is denoted by S(n). 

theorem 1-5 (mathematical induction) If it can be shown that, 

(a) when n = m, the proposition S(m) is true, 

and 

(b) if for n > n x , when S(n) is true then so also is S(n + 1), 
then the proposition S(n) is true for all natural numbers n > n x . 

A simple illustrative example will help here and we now prove inductively 

n 

that the sum 2 r of the first n natural numbers is given by n(l + n)/2. In 

r = l 

other words, in this example the proposition denoted by S(n) is that the 
following result is true : 

1 + 2 + • ■ • + n = «(1 + «)/2. 

Proof, step (a) First the proposition must be shown to be true for some 
specific value n = m. Any integral value m will suffice but if we set m = 1 
the proposition corresponding to S(l) is immediately obvious. If, instead, we 
had chosen m = 3, then it is easily verified that proposition S(3) is true, 
namely that 1 + 2 + 3 = 3(1 + 3)/2. 

Proof, step (b) We must now assume that proposition S(n) is true and 
attempt to show that this implies that the proposition S(n + 1) is true. If 
S(n) is true then 

1+ 2 +•••+« = »(1 + «)/2 

and, adding (n + 1) to both sides, we obtain 

1 + 2 + • • • + n + (n + 1) = «(1 + ii)/2 + (n + 1) 

= (n + 1X2. + «)/2. 

However, this is simply a statement of proposition S(n + 1) obtained by 
replacing n by n + 1 in proposition S(n). Hence S(l) is true and S(n) 
=> S(n + 1) so, by the conditions of Theorem 1-5, we have established that 
S(n) is valid for all n. 

Later we shall use this form of proof in cases less trivial than the above 
example which simply involved establishing the sum of an arithmetic progres- 
sion. 



SEC 1-6 MATHEMATICAL INDUCTION / 33 

As another illustration of an inductive argument we now consider the 
determination of the nth term in the sequence of numbers «o, "1, u->, ■ . ., 
defined sequentially by the equation 

m„ = 2m„-i+1. (1-17) 

Equations of this form which define a sequence of discrete numbers u n 
are called first-order difference equations. Tt is clear that this difference 
equation provides us with the algebraic rule by which the wth term of the 
sequence may be computed once the first term «o has been specified. Generally 
speaking, any rule which specifies the form of computation to be pursued in 
order to arrive at the solution of a given problem is called an algorithm. 

A few moments' experiment will suffice to convince the reader that the 
solution to Eqn (1-17) may be expressed in terms of wo by the equation 

i/„ = 2» Mo + (2»- 1). (1-18) 

The initial term «o of the sequence is arbitrary and on account of this fact 
such a solution is called a general solution of the first-order difference equation 
(IT 7). Once uo is specified by requiring that uq = C, say, then the solution is 
said to be a particular solution. 

The proof of Eqn (1T8) by induction again proceeds in two parts, with 
the proposition S(n) being that Eqn (IT 8) is the solution of Eqn (IT 7). 

Proof, step (a) If m = 1, then wi = 2« + (2 — 1) = 2t/o + 1, showing that 
the proposition S(l) is true. 

Proof, step (b) Assuming the proposition S(n) is true, then 

2m» + 1 = 2[2» Mo + (2» - 1)] + 1 
= 2» +1 w + (2' H1 - 1) 

= Un+U 

showing that S(n) => S(n + 1). The result is thus true for all n. 

To conclude this section, having introduced the notion of a difference 
equation let us take the concept a little further so that it can be used in more 
general circumstances. A homogeneous linear difference equation of order 2 
is a relationship of the form 

II n + aUn-l + bUn-2 = 0, (H9) 

where a and b are real constants and « B -2, Un-u u n are three consecutive 
members of a sequence of numbers. Given any two consecutive members in 
the sequence, say m and «i, then Eqn (1T9) provides an algorithm by which 
any other member of the sequence may be computed. 
Tf we seek a solution u n of the form 

u n = AV", (1-20) 



34 / INTRODUCTION TO SETS AND NUMBERS CH 1 

where A and X are real constants, then substitution into Eqn (1-19) shows that 

X 2 + aX + b = 0. (1-21) 

This is called the characteristic equation associated with the difference 
equation (1-19) and shows that solutions of the form of Eqn (120) are only 
possible when X is equal to one of the two roots X\ and Xz of Eqn (1-21), 
which we assume to be real numbers. If X\ -j= 1%, then AX\ n and Bfa" are both 
solutions of Eqn (1-19) and it is easy to show that 

u n = AXi n + BX 2 n (1-22) 

is also a solution, where A and B are arbitrary real constants. This result is 
the general solution of Eqn (1T9). Given specific values for u and «i, 
A and B can be deduced by substituting into Eqn (1 -22) and hence a particular 
solution found. 

Suppose, for example, that the difference equation was 

U n — U n -1 — Un-2 = 0, 

and that «o = "i = 1. Then the characteristic equation is 

X2 - X - 1 = 0, 

with the two roots Xi == (1 + V 5 )/ 2 and ^ = (1 — V5)/2. Hence the general 
solution has the form 

u „ A ^y + B (L=^iy. ,,.23) 

To deduce the values of A and B particular to our problem we use the 
initial conditions uo = 1 and wi = 1 to deduce from Eqn (1-23) that 

1 = A + B (case n = 0, «o = 1) 

1= ^lJ_V^ +5 (i_^ (case « =1,^=1)- 

Solving these equations for A and B we find 

V5 + 1 V5 - 1 

2V5 2^5 

whence the particular solution is 

The first few numbers wo, "i, W2, . . ., of the sequence generated by this 
algorithm are 

1, 1,2,3,5,8, 13,21,34,55,. . ., 



PROBLEMS / 35 

and comprise the well-known Fibonacci sequence of numbers. This sequence 
of numbers occurs naturally in the study of regular solids and in numerous 
other parts of mathematics. Naturally if only the first few members of the 
sequence are required then they are most easily found by use of the algorithm 
itself, which in the form 

Un = Un-1 + Un-2 

states that each member of the sequence is the sum of its two predecessors. 
It is not difficult to see that if the roots of the characteristic equation 
( 1 -2 1 ) are equal so that X x = X 2 = n, say, then Aju n is a solution of Eqn (1-19). 
In terms ofEqn(lT9) this is equivalent to saying that a 2 = 4b and fi = —a/2. 
However A/u n cannot be the general solution since it only involves one 
arbitrary constant A, and it is necessary to have two such constants in the 
general solution to allow the specification of the initial conditions u and m. 
The difficulty is easily resolved once we notice that nB/u n , with B an arbitrary 
real constant, is also a solution of Eqn (1T9). This is easily verified by direct 
substitution. For then we have for the general solution in the case of equal 
roots in the characteristic equation, 

u n = (A + nB)p». (1-24) 

To illustrate this situation, suppose that we are required to solve the 
difference equation 

U n = 6h„-i — 9u n - 2 

subject to the initial conditions wo = 1, «i = 2. Then the characteristic equa- 
tion becomes 

A 2 - 6A + 9 = 0, 
with the double root I — 3 . From Eqn ( 1 -24) the general solution must thus be 

u n = (A + nB) . 3 n . 

Using the initial conditions u = 1, wi = 3, then, shows that 

1 = A and 2 = 3(A + B), 
so that the particular solution to the problem in question is 

u n = (1 - in)3 n . 

PROBLEMS 

Section 11 

1-1 Enumerate the elements in the following sets in which I signifies the set of 
natural positive and negative integers including zero: 

(a) S= {n | we I, 5 < « 2 < 47}; 

(b) S = {n 3 | n e N, 15 < rfi < 40! ; 



36 / INTRODUCTION TO SETS AND NUMBERS CH 1 



(c) S = {(m, n) | m, n e I, 12 < m 2 + n 2 < 18); 

(d) S = {(///, n, m + n) \ m, n e N, 45 < m 2 - + « 2 , 3 < w + n < 9}; 

(e)S = {x|xEN, x 2 + Olx - 11 = 0}. 

1-2 Express the following sets in the notation of the previous question: 

(a) the set of positive integers whose cubes lie between 7 and 126; 

(b) the set of integers which are the squares of the integers lying between M 
and N(0< N < M); 

(c) the points in the plane that lie between circles of radii 1 and 3 drawn about 
the origin and which have x-coordinates greater than 0-5. 

1-3 Give an example of 

(a) a finite set having numerical elements, 

(b) a finite set having non-numerical elements, 

and in each case give an example of a proper subset. 

1-4 Give an example of 

(a) a set of ordered triples involving numerical quantities, 

(b) a set of ordered triples involving non-numerical quantities, 
and in each case give an example of a proper subset. 

1-5 State the relationships between the sets A and B if: 

(a) A = N, B = {2«|/zeN}; 

(b) A = {sin x | x = (1 + 12/7)^77, «eN}, B= {£}; 

(c) A = {1,2,3,4], B= {5,7,9, 11}. 

1-6 Form the union, intersection, and the complement of B relative to A of the 
sets A and B if: 

(a) A = N, B= {2n\ weN}; 

(b) A = {a, b, c, 0, 2, 4}, B = {d, e, f, 1, 3, 6, 7}; 

(c) A = {1, v'2, 2, 3, v "5, 6}, B = {0, y 2, v '5}. 

1-7 Construct Venn diagrams for the union and intersection of the sets A and B if: 

(a) A is the set of points interior to the unit square (that is, square having 
side of unit length) with one corner at the origin and lying entirely in the 
first quadrant, and B is the set of points exterior to the unit circle centred 
on the origin; 

(b) A is the set of points interior to the isosceles triangle of unit side with its 
centre of gravity at the origin and a side parallel to the x-axis, and B is 
the unit square having its centre at the origin and a side parallel to the 
x-axis. 

1-8 Represent by points on a graph the 36 possible outcomes of throwing two 
dice, each with faces numbered 1 to 6. Identify the set of points at which the 
sum of the scores on the two dice is greater than or equal to 7. 

1-9 By using Venn diagrams, prove Eqns (1-6) and (1-7) for sets which may be 
represented by points in the plane. 

110 Complete the details of the analytical proof of the first stated result of 
Theorem 11. 

111 Illustrate by means of a Venn diagram the result.^\(B r\ C) = (A\B) u (A\C) 
of Theorem 11. 

1-12 The expression (A\B) u (B\A) is called the symmetric difference of sets A and 
B. Illustrate the result by means of a Venn diagram and show that 



PROBLEMS / 37 



(A\B) u (B\A) = {A u B)\(A n B). 



113 Prove analytically that An B = A\(A\B) and illustrate your result by means 
of a Venn diagram. 

1-14 In the following expressions, replace the symbol * by <=, by => or by <?> to 
make them valid logical statements concerning the sets A, B and an element x: 

(a) xe A * xe A^J B; 

(b) x e B * xeAvB; 

(c) xe A * xe A n B; 

(d) xe A or xe B or x e A r\ B * xe Au B; 

(e) xe A or xe B, x$ A n B * xe(Av B)\(A n B). 
Give one example each of the use of => and o. 

1-15 If * is a set operation and it is true that (A * B) * C = A * (B * C), then the 
operation * is said to be associative. Use a Venn diagram to prove that 

(a) (/(uB)uC=^u(5uQoiuBuC; 

(b) (A n B) n C = A n(B n Q & A n B n C. 

Section 1-2 

1-16 Toss a coin 50 times and plot the relative frequency of 'heads'. 

1-17 Suggest a graphical representation for the sample space in which the outcome 
of tossing three coins might be recorded. 

1-18 Suggest a graphical representation for the sample space characterizing the 
score recorded in a trial involving the tossing of a die together with a coin 
which has faces numbered 1 and 2. Give examples of: 

(a) two disjoint subsets of the sample space; 

(b) two intersecting subsets of the sample space, indicating the points in their 
intersection. 

1-19 By using Eqn (1-9) explain why 

P{A nB) = P(B) P(A | B) = P(A) P(B | A). 

Verify your result by computing P(A), P(B), P(A | B), P{B \ A), and P(A n B) 
using the sets defined in connection with Fig. 1-6 (b). 

1-20 Use a Venn diagram to prove the generalized probability addition rule 
P(A u B u C) = P(A) + P(B) + P(C) - P(A n B) 

-P(A nQ- P(B nC) + P(AnBn C). 

1-21 Use Theorem 1-2 to prove the generalized probability multiplication rule 

P(A n B n C) = P(A) P(B \A)P{C\Ar\ B). 

1-22 Complete the argument in Example 1-2 (a). 

1-23 A bag contains 30 balls of which 5 are red and the remainder are black. A 
trial comprises drawing a ball from the bag at random, recording the result 
and then replacing the ball and shaking the bag. This process is called sampling 
with replacement. If this process is repeated twice, what is the probability of 
selecting 

(a) 2 red balls; 

(b) 2 black balls; 

(c) 1 red and 1 black ball? 



38 / INTRODUCTION TO SETS AND NUMBERS CH 1 

1-24 By considering arrangements of the five letters A, B, C, D, E verify that 
b Pi = 20 and Q) = 10. 

1-25 How many blends of coffee comprising equal quantities of 4 different types of 
coffee bean are possible if 9 different types of coffee bean are available. 

1-26 A game involves a team of 5 persons who play sequentially. How many 
different teams may be drawn up if 10 players are available. 

1-27 On the assumption that a participant in a raffle will buy either 2 or 4 numbered 
tickets, how many different sets of tickets may he choose from a book of 
20 tickets. 

1-28 A coin is biased so that the probability of 'heads' is 0-52. What is the prob- 
ability that: 

(a) 3 heads will occur in 6 throws; 

(b) 3 or more heads will occur in 6 throws? 

1-29 Shells fired from a gun have a probability \ of hitting the target. What is the 
probability of missing the target if 4 shells are fired? 

1-30 Draw the probability density function for the binomial distribution in which 
p = J and n = 6. Use your result to draw the corresponding cumulative 
distribution function. 

1-31 By considering Fig. 1-6 (a) deduce and draw the probability density function 
describing the sum of the scores on the two dice. 

Section 1-3 

1-32 Describe two different ways of defining N rational numbers between 1 and 2. 
Generalize one of these methods to interpolate N rationals between any two 
rationals a and b. 

1-33 Working from the array of rational numbers given in Section 1-3, use arrows 
to suggest two alternative schemes to the one already described by which all 
the rational numbers may be enumerated. Is this array the only possible one 
that may be used ? If not, give an alternative. 

1-34 Use the fact that y'2 is irrational to prove that if a is a rational number, then 
a + v'2, ay 2 and V2/a are also irrational. Would the results still be true if 
\ 2 were replaced by any other irrational number, and would your proof 
still suffice? 

1-35 Prove that \/3 is irrational. (Hint: first assume that \/3 is rational and equal 
to p/q, and then obtain a contradiction by considering even and odd values of 
q separately.) 

1-36 The operation of division is defined in terms of multiplication as indicated in 
the following problem. The reader is required to provide the justification for 
some familiar arithmetic operations using only the operation of multiplication 
and the definition provided. Given that a and b are real numbers and that 
b /- 0, we define ajb by k = ajb if, and only if, kb = a. Does this define alb 
uniquely? Why is it necessary that 6^0? Show that ajb = cajcb whenever 
c / and that ajb + c/d = (ad + bc)/bd; (a/b^c/d) = ac/bd; l/Ca/b) = bja 
(a =£ 0). 



PROBLEMS / 39 



1-37 Prove the following statements concerning real numbers by directly applying 
the real number axioms: 

(a) There is just one zero element and one unit element; 

(b) a + £ = a + // => £ = 1/ ; 

(c) O.a = a.O = 0; 

(d) ai = a>i and a + => £ = >/; 

(e) (-a)A = a(-Z>) = -(a/>); (-a)(-ft) = ab; 

(f) a ft = => a = or ft = 0; 

(g) 0(6 — r) = ab — ac. 

1-38 The expression {a r }" 1 denotes the sequence of numbers ai, 02, . . ., a„. 
Given that {a r )l i = 0-2, 3, 1-8, 2-2, 1, 3, 2 and {MJ-i = 0-3, 2, 1-8, 1-1, 2, 
4, 1 verify the inequality of Example 1-5. 

1-39 Prove that if a > b > and k > then 

b b + k , a + k a 

- < < 1 < < — 

a a + k ft + k b 

1-40 Indicate by means of a diagram the intervals defined by the following expres- 
sions, using a dot to signify an end point belonging to an interval and a circle 
to indicate an end point excluded from the interval : 

(a) (x + 2)(x +3)<(x- \){x - 2); 

(b) < I x - 3 I < 1 ; 

(c) |*| < 2; 

(d) < I 2x + 1 I < 1 ; 

(e) I 3jc + 1 I >2; 

(f) £±J> < x 



2^+2 2(x - 1) 

1-41 Identify the regions in the (x, vO-p'ane determined by the following inequalities. 
Mark a boundary that belongs to the region by a full line; a boundary that 
does not by a dotted line; an end point that is included in an interval by a dot; 
an end point excluded from an interval by a circle: 

(a) x 2 + y 2 < 1 ; x < 0; y < —x; 

(b) y < sin x; x 2 + v 2 > "- 2 ; y < I ; 

(c) ix 2 + v 2 > l; \y\<i; 

(d) y >x 2 ; \x - 1 | < 1; y <A. 

1-42 Give numerical examples to illustrate Theorem 1-4. 

1-43 Prove Theorem 14(b) by considering separately the cases a 0, b 0; 
a < 0, b < 0; a > 0, ft < 0; a < 0, b > 0. 

1-44 Express these numbers in binary notation: 
(a) 27; (b) lyV; (c) 2-Jf ; (d) i», 

1-45 Express the following numbers in binary notation, and then use your results 
to form the expressions a + ft, a — ft, and ab. Check by interpreting the results 
in terms of the base 10: 
(a) a = 12, ft = 11 ; (Jo) a = 3iV, * = J; (O a = j», ft = ,'„. 

1-46 Give numerical examples to illustrate Theorem 1-4 using binary notation. 

1-47 Using the number system to base 3 and the digits 0, 1, 2 represent these 
numbers: 



40 / INTRODUCTION TO SETS AND NUMBERS 



CH 1 



(a) 27; (b) 2|; (c) 2J,; (d) 11 

1-48 Using the number system to base 3 and the digits 0, 1, 2 write out the addition 
and multiplication table for three digits analogous to those of Section 1-5. 

1-49 Express the following numbers in terms of the base 3 and use the tables of the 
previous problem to evaluate a + b, a — b, and ab. Check by re-interpreting 
your results in terms of the base 10. 
(a) a = 4, b = 2J; (b) a = 3, b = £; (c) a = •£, b = - 2 V- 

1-50 Give an inductive proof that 



(a) ]£ (a + «0 = - [2a + (n 

r = L 



Dd]; 



(b) 2 r* = 

r = l 

1-51 The expansion 



«(/! + l)(2n + 1) 



(a + A)» = a n + na n ~ x b + ^-— — - a"~ 2 b 2 + 



(Arithmetic Progression) 
(Sum of Squares) 
• • + nab"' 1 + b", 



and the equivalent result 

( a + b)» = J I" j a r b»-<; 

are called the binomial expansion. Prove the result inductively for the case 
when n is a natural number. 

1-52 Give an inductive proof of the results 

n-l j _ r n 



(a) 2 r» = 



(b) 2 * 3 = 



1 



'«(«+!)" 



(Geometric Progression) 



(Sum of Cubes) 
1-53 Find the general solution to the difference equation 

Un + Un-i — 6u n -2 = 0. 

Determine the particular solution corresponding to wi = 1, ui = 1. 
1-54 Find the general solution to the difference equation 

U n — 3w n -l + 2u n -2 = 0. 

Determine the particular solution corresponding to wi = 3, wi = 7. 
1-55 Find the solution to the difference equation 

Un — 2u n -l + Un-2 = 

given that ui = 2, t/2 = 3. 
1-56 Find the general solution to the difference equation 

Un — 6u n ~l + 9«n-2 = 0. 

Determine the particular solution corresponding to Hi = 1, «2 = —3. 



Variables, functions, and 
mappings 



2-1 Variables and functions 

In the physical world the idea of one quantity depending on another is very 
familiar, a typical example being provided by the observed fact that the 
pressure of a fixed volume of gas depends on its temperature. This situation 
is reflected in mathematics by the notion of & function, which we shall now 
discuss in some detail. 

The modern definition of a function in the context of real numbers is 
that it is a relationship, usually a formula, by which a correspondence is 
established between two sets A and B of real numbers in such a manner 
that to each number in set A there corresponds only one number in set B. 
The set A of numbers is the domain of the function and the set B of numbers 
is the range of the function. 

If the function or rule by which the correspondence between numbers in 
sets A and B is established is denoted by/, and x denotes a typical number in 
the domain A of/, then the number in the range B to be associated with x 
by the function/is written /(x) and is read '/of x'. The numbers x and/(x) 
are variables with x being given the specific name independent variable and 
f(x) the name dependent variable. The independent variable is also often 
called the argument of the function/. 

It is often helpful to construct the graph of/ which mathematically is 
the set of ordered number pairs (x,f(x)), where x belongs to the domain of/ 
Geometrically the graph of/ is usually represented by a plane curve, drawn 
relative to an origin defined by the intersection of two perpendicular straight 
lines called axes. The process of construction is as follows. A distance propor- 
tional to x is measured along one axis and a distance proportional tof(x) 
along the other axis. Through each resulting point on an axis is then drawn a 
line parallel to the other axis and these two perpendicular lines intersect at a 
unique point in the plane of the axes. This point of intersection is the point 
(x,f(x)) and the graph of/ is defined to be the locus or curve formed by 
joining up all such points corresponding to the domain of/ as illustrated 
by Fig. 2-1. 

However, it is not necessary to use axes of this type, called rectangular 
Cartesian axes, and any other geometrical representation which gives unique 
representation of the points (x,f(x)) would serve equally well. Thus the 
axes could be inclined at an angle a ^ \n and the scale of measurement along 
them need not be uniform. For example, it is often useful to plot the logarithm 



42 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

of x along the x-axis, rather than x itself. This compresses the x scale so that 
large values of x may be conveniently displayed on the graph together with 
small values. Another possible representation involves the use of curved 
reference axes and leads to curvilinear coordinates. This will be taken up 
again later in connection with conformal mapping. 

Not every function can be represented in the form of an unbroken curve, 
and the function 

( when x is rational, 
J *-' Y ' — \ 1 when x is irrational, 

provides an extreme example of this situation. Here, although the graph 
would look like a line parallel to the x-axis on which all points have the 
value unity, in reality the infinity of points with rational x-coordinates would 
be missing since they lie on the x-axis itself. The domain is all the real numbers 
R and the range is just the two numbers zero and unity. 

Because / transforms one set of real numbers into another set of real 
numbers a function is sometimes spoken of as a transformation between 
sets of real numbers. On account of the restriction to real numbers or, more 
explicitly, to real variables, the function f(x) is called a. function of one real 
variable. Another name that is often used for a function is a mapping of some 
set of real numbers into some other set of real numbers. This name is of course 
suggested by the geometrical illustration of the graph of a function and we 
shall return more than once to the notion of a mapping. In this terminology, 
f(x) is referred to as the image of x under the mapping/. 

Since the domain and range of/ occur as intervals on the x- and j-axes, 
it is convenient to use a simplified notation to identify the form of the interval 
that is involved. We now adopt the almost standard notation summarized 
below in which a round bracket indicates an open end of an interval, and a 
square bracket indicates a closed end of an interval : 

(a, b) o a < x < b, 
[a, b] o a < x < b, 
(a, b] o a < x < b, 
[a,b)o a< x < b, 
(—oo, a] o x < a, 
[a, oo) o a < x, 
(— oo, oo) o all x e R. 

As the definition of open and closed intervals is only a matter of considering 
the behaviour of the end points, we shall define the length of all the intervals 
(a, b), [a,b), (a, b], and [a, b] to be the number b — a. This is consistent with 



SEC 2-1 



VARIABLES AND FUNCTIONS / 43 



the obvious result that the length of an 'interval' comprising only one point 
is zero. 




Domain of/ 



Fig. 21 Domain, range, and graph o£f(x). 



It may happen that when x lies within some interval, as for example the 
interval (b, c] in Fig. 2-1, each point x is associated with a unique image point 
f(x) and, conversely, each image point/(x) is associated with a unique point x. 
Such a mapping or function /is then said to be one-one in the domain in 
question. 

However, there is another possibility that can arise and that is that in some 
interval of the x-axis, more than one point x may correspond to the same 
image point f(x). This is again well illustrated by Fig. 2-1 if now we consider 
the interval [a, b] and the points X2 and xz, both of which have the same image 
point since /(x 2 ) = f(xs). In situations such as these the mapping or function 
/if said to be many-one in the domain in question. 

A specific example might help here and we choose for / the function 
f(x) = x 2 and the two different domains [0, 3] and [—1, 3]. A glance at 
Fig. 2-2 shows that /maps the domain [0, 3] onto the range [0, 9] one-one, 
but that it maps the domain [—1, 3] onto the same range [0, 9] many-one. 
Expressed another way, the range [0, 1] shown as a solid line in the figure is 
mapped twice by points in the domain [—1, 3]; once by points in the sub- 
domain — 1 < x < and once by points in the sub-domain < x < 1 . 
Again considering the domain [—1, 3], the function/(x) = x 2 maps the sub- 
domain 1 < x < 3 onto the range (1, 9] one-one. 

In many older books the term function is used ambiguously in that it is 
sometimes applied to relationships which do not comply with our definition 



44 / VARIABLES, FUNCTIONS, AND MAPPINGS 



CH 2 




Fig. 2-2 Example of many-one mapping in shaded range and a one-one mapping 
in the hatched range. 



of a function. The most familiar example of this is the 'function' y = a/x, 
which fails to comply with our definition because to every positive x there 
correspond two values for y, namely the positive and negative square roots of 
x which are equal in magnitude but opposite in sign. A mapping of this kind 
is one-many in the sense that to one value of x there correspond more than 
one image point /(x), and although it is permissible to describe this relation- 
ship as a mapping, it is incorrect to term it a function. 

Nevertheless, the square root operation is fundamental to mathematics 
and we must find some way to make it and similar ones legitimate. The 
difficulty is easily resolved if we consider how the square root is used in 
applications. In point of fact two different relationships are always con- 
sidered which together are equivalent to y = y'x. These are yi = + \/x and 
y 2 = — ^x, where the square root is always to be understood to denote the 
positive square root and the sign identifies the relationship being considered. 
Each of the mappings yi(x) and yi(x) of the domain (0, oo) are one-one as 
Fig. 2-3 shows, so that they may each be correctly termed a function, the 
particular one to be used in any application being determined by other con- 
siderations, such as that the result must be positive or negative. These ideas 
will arise again later in connection with inverse functions. 



SEC 2-1 



VARIABLES AND FUNCTIONS / 45 




Fig. 2-3 The square root function. 



In general, if the domain of function/is not specified then it is understood 
to be the largest interval on the x-axis for which the function is defined. So if 
fix) = x 2 + 4, then as this is defined for all x, the largest possible domain 
must be (— oo, co). Alternatively the function /(x) = +\/(4 — x 2 ) is only 
defined in terms of real numbers when — 2 < x < 2 showing that the largest 
possible domain is [—2,2]. Similarly, the function f(x) = 1/(1 — x) is 
defined for all x with the sole exception of x = 1 so that the largest possible 
domain is the entire x-axis with the single point x = 1 deleted from it. 

A function need not necessarily be defined for all real numbers on some 
interval and, as in probability theory, it is quite possible for the dependent 
and independent variables to assume only discrete values. Thus the rule which 
assigns to any positive integer n the number of positive integers whose squares 
are less than n, defines a perfectly good function. Denoting this function by 
/ we have for its first few values /(l) = 0, /(2) = 1, /(3) = 1, /(4) = 1, 
/(5) = 2, /(6) = 2, /(7) = 2, /(8) = 2, /(9) = 2, /(10) = 3, . . .. Clearly, 
both its domain and its range are the set N of natural numbers and the 
mapping is obviously many-one. 

Before examining some special functions let us formulate our definition 
of a function in rather more general terms. This will be useful later since 
although in the above context the relationships discussed have always been 
between numbers, in future we shall establish relationships between quantities 
that are not simply real numbers. When we do so, it will be valuable if we 
can still utilize the notion of a function. This will occur, for example, when 
we establish correspondence between quantities called vectors which although 
obeying algebraic laws are not themselves real numbers. 

The idea of a relationship between arbitrary quantities is one which we 
have already started to examine in the previous chapter in connection with 



46 / VARIABLES, FUNCTIONS, AND MAPPINGS 



CH 2 



sets. As might be expected, set theory provides the natural language for the 
formulation and expression of general ideas associated with functions, and 
indeed we have already used the word 'set' quite naturally when thinking of a 
set of numbers. A more general definition follows. 

definition 2-1 A function f is a correspondence, often a formula, by 
which each element of set A which is called the domain of/, is associated 
with only one element of set B called the range of/. 

To close this section we now provide a few examples illustrating some of 
the ideas just mentioned. 

Example 21 The function y =f(x) defined by the rule 
1 



/(*) 



(x - l)(x - 2) 



is defined for all real x with the exception of the two points x = 1 and x = 2. 
The domain of/is thus the set of real numbers R with the two numbers 1 and 
2 deleted. In set notation the domain is {R\{1, 2}}. The two lines x = 1 and 




Fig. 2-4 Graph of y = \/(x — l)(x — 2) showing the asymptotes x = 1 and x = 2. 



x = 2 shown dotted in Fig. 2-4 are called asymptotes to the graph of/ and 
although the graph approaches arbitrarily close to the asymptotes it never 
coincides with them. 



SEC 2-1 



VARIABLES AND FUNCTIONS / 47 



Example 2-2 A discrete valued function may be denned by a table which is 
simply an arrangement of ordered number pairs in a sequence. 



Table 21 










X 





1 


3 


7 


/(*) 


21 


4-2 


10 


6-3 



Example 2-3 One possible system of curvilinear coordinates in the first 
quadrant may be defined as follows. Using Cartesian coordinates, construct 
the set of curves y = ajx and the set of straight lines y = mx, each with 
domain (0, oo) and with a > 0, m > 0. Representative examples of these 
curves are shown in Fig. 2-5 (a) for the stated values of a and m. 





y 


m = 4 




m 


= 1 


3 












2 










in 

5 


1 






«^- 




i t 







1 




2 3 


x> 





(a) (b) 

Fig. 2-5 (a) Families of curves y = ajx and y = mx; (b) curvilinear coordinates. 



In general, any set of curves such as either of these which is derivable 
from the same equation by a suitable choice of constant is called a. family of 
curves, and the constant which is fixed for any one curve but which varies 
from curve to curve, is called a parameter. This term parameter will often be 
used in contexts which do not involve families of curves, but in every case it 
will be used as here in the sense that it implies a 'variable constant'. 

Next we disregard the Cartesian axes and the manner of construction of 
the two families of curves and regard the two families of curves themselves 
as defining new coordinate lines as in Fig. 2-5 (b). Each member of the family 
of rectangular hyperbolas will then define a line along which a is constant, 
no two members of the family either intersecting or having the same value of 
a. Similarly, each member of the family of straight lines through origin 0, 
collectively called a pencil of lines, is characterized by a different value of m. 



48 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

Apart from the single point through which all the straight lines pass and 
appropriately called a singular point, there is no ambiguity as to the values of 
a and m to be associated with any point in the region of the plane defined 
by the two families. We shall use the quantities a and m as our new coordinates 
for a point. Graphs may now be constructed using the two families of curves 
as curvilinear coordinates. The intersection of a hyperbola and straight line 
will define a point in the plane with coordinates given by the ordered number 
pair {a, m). Thus the points A, B, and C in Fig. 2-5 (b) have curvilinear 
coordinates (J, 1), (1, J), and (2, 4), respectively. 

Naturally the graph of the function y = x 2 with domain (0, oo) would 
look different when plotted first in Cartesian coordinates and then in these 
curvilinear coordinates by setting a = x and m = y. They would however be 
two different geometrical representations of the same function. Here we have 
made use of the useful symbol =, which is read 'identically equal to'. 

Example 2-4 This example is a final illustration of our more general defini- 
tion of a function. Take as the domain of the function / the set A of all 
people, and as the range B of the function/the set of all towns in the world. 
Then for the function /we propose the rule that assigns to every person his 
place of birth. 

Clearly this function defines a many-one mapping of set A onto set B, 
since although a person can only be born in one place, many other people 
may have the same place of birth. This example also serves to distinguish 
clearly between the concept of a 'function' which is the rule of assignment, 
and the concept of the 'variables' associated with the function which here are 
people and places. 

2-2 Inverse functions 

In the previous section we remarked that a typical example of a correspon- 
dence between physical quantities was the observed fact that the pressure of a 
fixed volume of gas depends on its temperature. Expressed in this form we are 
implying that the dependent variable is the pressure p and the independent 
variable is the temperature T, so that the law relating pressure to temperature 
has the general form 

P = 4>(T), (A) 

where <f> is some function that is determined by experiment. 

However, we know from experience that in thermodynamics it is often 
necessary to interchange these roles of dependent and independent variables 
and sometimes to regard the temperature T as the dependent variable and the 
pressure p as the independent variable, when the temperature-pressure law 
then has the form 

T=y>(p), (B) 



SEC 2-2 



INVERSE FUNCTIONS / 49 



where, naturally, the function %p is dependent on the form of the function <f>. 
Indeed, formally, <f> and xp must obviously satisfy the identity <f>[y{p)] = p 
for all pressures p in the domain of y. 

The relationships (A) and (B) are particular cases of the notion of a 
function and its inverse and the idea is successful in this context because the 
correspondence between temperature and pressure is known to be one-one. 

Consider a general case of a function 

y=m (2-1) 

that is one-one and defined on the domain [a, b], together with its inverse 



x = g(y) 

which has for its domain the interval [c, d] on the j-axis. 



(2-2) 




Fig. 2-6 (a) Ihversion through the graph of/(x); (b) inversion by reflection my = x. 

Graphically the process of inversion may be accomplished point by point 
as indicated in Fig. 2-6 (a). This amounts to selecting a point y in [c, d] and 
then finding the corresponding point x in [a, b] by projecting horizontally 
from y until the graph of/ is intercepted, after which a projection is made 
vertically downwards from this intercept to identify the required point on the 
x-axis. 

The relationship between a function and its inverse is represented in 
Fig. 2-6 (b). In this diagram we have used the fact that when a function is 
represented as an ordered number pair, interchange of dependent and inde- 
pendent variables corresponds to interchange of numbers in the ordered 
number pair. The lower curve represents the function y = /(*) and the upper 
curve represents the function y = g{x), with the function g inverse to /; 



50 / VARIABLES, FUNCTIONS, AND MAPPINGS 



CH 2 



both graphs being plotted using the same axes. The line y = x is also shown 
on the graph to emphasize that geometrically the relationship between a 
one-one function and its inverse is obtained by reflecting the graph of either 
function in a mirror held along the line y = x. Henceforth such a process will 
simply be termed reflection in a line. Notice that when using this reflection 
property to construct the graph of an inverse function from the graph of the 
function itself, both functions are represented with y plotted vertically and 
x plotted horizontally. This follows because the range of/ is the domain of 
g, and vice versa. 

No difficulty can arise in connection with a function and its inverse 
because of the one-one nature of the mapping. Expressed more precisely, we 
have used the obvious property illustrated by Fig. 2-6 (a) that a one-one 
function/ with domain [a, b] is such that/(;ci) =/(jc 2 ) => x\ = X2 for all xi 
and X2 in [a, b]. 

In graphical terms this result can only be true if the graph of/ either 
increases or decreases steadily as x increases from a to b. When either of 
these properties is true of a function then it is said to be strictly monotonic. 
In particular, if a function /increases steadily as x increases from a to b, as 
in Fig. 2-6 (a), then it is said to be strictly monotonic increasing and, conversely, 
if it decreases steadily then it is said to be strictly monotonic decreasing. 

Slightly less stringent than the condition of strict monotonicity is the 
condition that a function /be just monotonic. This is the requirement that/ 
be either non-decreasing or non-increasing, so that it is permissible for a 
function that is only monotonic to remain constant throughout some part 
of its domain of definition. The adjectives increasing and decreasing are again 
used to qualify the noun monotonic in the obvious manner. Representative 
examples of monotonic and strictly monotonic functions, all with domain of 
definition [a, b] are shown in Fig. 2-7. 



Decreasing 




(a) (b) 

Fig. 2-7 Monotonic and strictly monotonic functions: (a) monotonic; (b) strictly 
monotonic. 

The example of a strictly monotonic decreasing function shown in Fig. 
2-7 (b) has also been used to emphasize that a function need not be repre- 
sented by an unbroken curve. The curve has a break at the single point 



SEC 2-2 INVERSE FUNCTIONS / 51 

x = a where it is defined to have the value y — /?. However, as the value /? 
lies between the functional values on adjacent sides of x = a the function is 
still strictly monotonic decreasing. Had we set /? = 0, say, then the function 
would be neither strictly monotonic nor even monotonic on account of this 
one point ! 

It is sometimes useful to relate a function and its inverse by essentially 
the same symbol and this is usually accomplished by adding the superscript 
minus one to the function. Thus the function inverse to/is often denoted by 
f~ l which is not, of course, to be misinterpreted to mean Iff. Before examining 
some important special cases of inverse functions when many-one mappings 
are involved, let us formalize our previous arguments. 

definition 2-2 Let the set onto which the one-one function/ with domain 
[a, b] maps the set S of points be denoted by/(S). Then we define the inverse 
mapping/- 1 of f(S) onto S by the requirement that f" x {y) = x if and only if 
y =f(x) for all x in [a, b]. 

It now only remains for us to consider how some important special func- 
tions such as y = x 2 , y = sin x, and y = cos x, together with other simple 
trigonometric functions which are all many-one mappings, may have un- 
ambiguous inverses defined. 

Firstly, as we have already seen, the function y = x 2 gives a many-one 
mapping of [—a, a] onto [0, a 2 ]. Here the difficulty of defining an inverse is 
resolved by always taking the positive square root and defining two different 
inverse functions 

x = +Vy an d x = —\/y, 

which are then both one-one mappings of (0, a 2 ]. The inversion must thus 
be regarded as having given rise to two different functions; the one to be 
selected depending on other factors as mentioned in connection with Fig. 
2-3. If we recall that the domain of definition of a function forms an intrinsic 
part of the definition of that function, then y = x 2 may be regarded as two 
one-one mappings in accordance with the two inverses just introduced. 

This is achieved by defining the many-one function y = x 2 on the domain 
[—a, a] as the result of the two different one-one mappings 

y = x 2 on — a <; x < and y — x 2 on < x <; a, 

the difference here being only in the domains of definition. The point is 
excluded from both domains since that single point maps one-one. By means 
of this device we may, in general, reduce many-one mappings to a set of 
one-one mappings so that the inversion problem is always straightforward. 
It will suffice to discuss in detail only the inversion of the sine function, 
after which a summary of the results for the other elementary trigonometric 
functions will be presented in the form of a table. In general, as shown in 



52 / VARIABLES, FUNCTIONS, AND MAPPINGS 



CH 2 



Fig. 2-8 (a), the function y = sin x maps an argument x in the set R of real 
numbers onto [—1,1] many-one, but it maps any of the restricted domains 
[(2h — 1)^77, (2n + 1)^7t] corresponding to integral n onto [—1, 1] one-one. 





>', 


1 


arc sin 


iff 

c 

'33 


•jjill"^^ 


**l 






-** • fc 








-1 




*. 




















(a) 



Fig. 2-8 Principal branch of sine function : (a) principal branch of sin x giving 
one-one mapping in [— £w, £»]; (b) inversion of sin x by reflection my = x. 



Now in line with our approach to the inverse of the square root function, 
the ambiguity as regards the function inverse to sine may be completely 
resolved if we consider the many-one function y = sin x with x e R as being 
replaced by an infinity of one-one functions y = sin x, with domains 
[(2« — l)in, (2n + l)$n]. For then in each domain corresponding to some 
integral value of n, because the mapping there is one-one, an appropriate 
inverse function may be defined without difficulty. 

The intervals are all of length n and are often said to define different 
branches of the inverse sine function. In general, when no specific interval is 
named we shall write x = Arcsin y, whenever y = sin x. The function 
Arcsine thus denotes an arbitrary branch of the inverse sine function. 
Because of the periodicity of the sine function, when considering the inverse 
function it is only necessary to study the behaviour of one branch of Arcsine. 
As is customary, we arbitrarily choose to work with the branch of the inverse 
sine function associated with the domain [—frr, far], calling this the principal 
branch and denoting the inverse function associated with this branch by 
arcsine. Hence for the inverse we shall always write x = arcsin y when 
y = sin x and — \n < x <; \n. 

In Fig. 2-8 (b) is shown in relation to the line j> = x the function/ = sin x 



SEC 2-3 



INVERSE FUNCTIONS / 53 



with domain of definition [— \n, \n\ and the associated function y = arcsinx 
with domain of definition [—1, 1]. The reflection property of inverse functions 
utilized in connection with Fig. 2-6 (b) is again apparent here. It should 
perhaps again be emphasized that when an inverse function is obtained by 
reflection in the line y = x, then in both the curves representing the function 
and its inverse, the variable y is plotted as ordinate (i.e. vertically) and the 
variable x as abscissa (i.e. horizontally). 

Table 2-2 summarizes information concerning the most important inverse 
trigonometric functions and should be studied in conjunction with Fig. 2-9. 
In general the notation for a function inverse to a named trigonometric 




"i* 



x t = arctanj, 




i in 



H 





K.v 




y*. it 
V* 

V J 1 

1 




/// 
V 

■ ■ . ..*■■ ■■ 

■- * :.■:.■■" 

■ ■■■■v.. 
■v 

■ ■ ■■■■■■• ■ 

'.v. ■ . 

.. ■ ■ y . ' . 

/ 
/ 

y 


-1, 


/ 
/ 
s 

/ 
s 


l\i* iH 


/ 

* 

* 




1 X^ x 


-1 




(b) 



x ^M 



y = tan x 



(c) 




Fig. 2-9 Principal branches of inverse cosine and tangent functions : (a) principal 
branch of cos x; (b) inversion of cos x by reflection in y = x; (c) principal branch of 
tan x; (d) inversion of tan x by reflection in y = x. 



54 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

function is obtained by adding the prefix arc when referring to the principal 
branch and Arc otherwise. In other books the convention is often to add the 
superscript minus one after the named function, distinguishing the principal 
branch by use of an initial capital letter when writing the function. Thus, for 
example, some authors will write Sin -1 in place of arcsine and sin -1 in place 
of Arcsine. Unfortunately notations are not uniform here and so when using 
other books the reader would be well advised to check the notation in use. 



Table 2-2 Trigonometric functions and their inverse functions 



Function 


Domain 


Inverse function 


Branch 


Domain 


y = sin x 


[-*», W 


y = arcsin x 


Principal 


[-1,1] 


y = sin x 


[(2/i - 1)K (2« + 1)M 


y = Arcsin x 


Any 


[-1,1] 


y = cos x 


N>,»] 


y = arccos x 


Principal 


[-1,1] 


y = cos x 


[rnr, (n + \)ir] 


y = Arccos x 


Any 


[-1,11 


y = tan x 


(-*», w 


y = arctan x 


Principal 


(—00, oo) 


y = tan x 


«2n - 1)K {In + l)fcr) 


y = Arctan x 


Any 


(-00, CO) 



2 -3 Some special functions 

A number of special types of function occur often enough to merit some 
comment. As the ideas involved in their definition are simple, a very brief 
description will suffice in all but a few cases. To clarify these descriptions, the 
functions are illustrated in Fig. 2- 10. 

(a) Constant function 

The constant function is a function y = f(x) for which f(x) is identically 
equal to some constant value for all x in the domain of definition [a, b] . 
Thus a constant function has the equation y = constant, for x e [a, b] . 

(b) Step function 

Consider some set of n sub-intervals or partitions [ao, fli), [fli, 02), [02, 03), 
. . ., [a n -i, a n ] of the interval [ao, a«J. Associate n constants C\, C?, . . ., C n 
with these. n sub-intervals. Then a step function defined on [ao,a n ] is the 
function y =f(x) for which /(x) = C r , for all x in the rth sub-interval. The 
function will be properly defined provided a functional value is assigned to 
all points x in [ao, a„] including end points of the intervals. Usually it is 

Fig. 2-10 (opposite) Some special functions: (a) constant function; (b) step function; 

(c) y = |x| ; (d) even function; (e) odd function; (f) bounded function on [a, b]. 



SEC 2-3 



SOME SPECIAL FUNCTIONS / 55 





J 


C 3 

C 2 
C 4 
C l 


t • i 

1— * ' 

I I * 1 
1 1 ' [ 

1 t I 1 1 » 

i i ! i i ! i 

i . ii i.ik 





a a t a 2 a 3 a„_, a„ x' 
(b) 




56 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

immaterial to which of two adjacent sub-intervals an end point is assigned 
and one possible assignment is indicated in Fig. 2-10 (b), where a deleted end 
point is shown as a circle and an included end point as a dot. 

(c) The function fjc| 

From the definition of the absolute value of x it is easily seen that the graph 
of y = \x\ has the form shown in Fig. 2-10 (c). It is composed of the line 
y — x for x > and the line y = — x for x < 0. 

(d) Even function 

An even function y = f(x) is a function for which /(—x) =/(*). The geo- 
metrical implication of this definition is that the graph of an even function is 
symmetrical about the j-axis so that the graph for negative x is the reflection 
in the j-axis of the graph for positive x. Typical examples of even functions 
are y = cos x, y = 1/(1 + x 2 ) and the function _y = |x| just defined. 

(e) Odd function 

An odd function y = fix) is a function for which /(—x) = —f(x). The geo- 
metrical implication of this definition is that the graph of an odd function is 
obtained from its graph for positive x by first reflecting the graph in the 
j-axis and then reflecting the result in the x-axis. In Fig. 2-10 (e) the result of 
the first reflection is shown as a dotted curve and its reflection in the x-axis 
gives a second curve shown as a full line in the third quadrant which, to- 
gether with the original curve in the first quadrant, defines the odd function. 
By virtue of the definition we must have /(0) = — /(0), showing that the 
graph of an odd function must pass through the origin. Typical odd functions 
are y = sin x and y = x 3 — 3x. Most functions are neither even nor odd. 
For example, y = x 3 — 3x + 1 is not even, since y(—x) = (— x) 3 — 3( — jc) 
+ 1 = —x 3 + 3x + 1 =£ y(x), nor, by the same argument, is it odd, for 
y(-x) # -y(x\ 

(f ) Bounded function 

A function y =f(x) is said to be bounded on an interval if it is never 
larger than some value M and never smaller than some value m for all values 
of x in the interval. The numbers M and m are called, respectively, upper and 
lower bounds for the function /(x) on the interval in question. It may of 
course happen that only one of these conditions is true, and if it never exceeds 
M then it is said to be bounded above, whereas if it is never less than m it is 
said to be bounded below. A bounded function is thus a function that is 
bounded both above and below. The bounds M and m need not be strict in 
the sense that the function ever actually attains them. Sometimes when the 



SEC 2-4 SOME SPECIAL FUNCTIONS / 57 

bounds are strict they are only attained at an end point of the domain of 
definition of the function. 

Of all the possible upper bounds M that may be assigned to a function 
that is bounded above on some interval, there will be a smallest one M', say. 
Such a number M' is called the least upper bound or the supremum of the 
function on the interval and the name is usually abbreviated to I.u.b. or to 
sup. Similarly, of all the possible lower bounds in that may be assigned to a 
function that is bounded below on some interval, there will be a largest one 
m', say. Such a number tri is called the greatest lower bound or the infimum 
of the function on the interval and the name is usually abbreviated to g.l.b. 
or to inf. 

Not all functions are bounded either above or below, as evidenced by the 
function y = tan x on (—\tt, \tt), though it is bounded on any closed sub- 
interval not containing either end point. Typical examples of bounded func- 
tions on the interval (—00, 00) are y = sin x and y = cos x\{\ + x 2 ). The 
function y = \\{x — 1) is bounded below by zero on the interval (1, 00) but 
is unbounded above, whereas the function y = 2 — x 2 is strictly bounded 
above by 2 but is unbounded below on the interval (—00, 00). 

(g) Convex and concave functions 

A convex function is one which has the property that a chord joining any 
two points A and B on its graph always lies above the graph of the function 
contained between those two points. Similarly, a concave function is one 
which has the property that a chord joining any two points A and B on its 
graph always lies below the graph of the function contained between those 
two points. Thus the function y = \x\ shown in Fig. 2-10 (c) is convex on the 
interval (— 00, 00) whereas the function shown in Fig. 2-10 (d) is only concave 
on the closed interval [—a, a]. 

(h) Polynomial and rational functions 

A polynomial of degree n is an algebraic expression of the form 

y = a„x n + a n -ix n ~ l + • ■ ■ + a\x + a , 

where n is a positive integer and it is defined for all x. 

A rational function is a function which is capable of expression as the 
quotient of two polynomials and so has the form 

b m x m + b m -ix™- 1 + ■ ■ ■ + b x x + b 



y = 



a n x n + an-ix"' 1 + • • • + aix + a 



and is defined for all values of x for which the denominator does not vanish. 
An example of a polynomial of degree 2 is the quadratic function 
y = x 2 — 3x + 4 ; a typical rational function is 



58 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 



3.x: 2 - 2.x - 1 

)' = 



4x 3 + llx 2 + 5.V-2 

which is defined for all values of a- apart from x = —2, x = — 1, and x = J, 
at which points the denominator vanishes. For this reason these values are 
called the zeros of the polynomial forming the denominator and they arise 
directly from its factorization into the form 

4x 3 + \\x 2 + 5x -2~(4x - \)(x + 2)0 + 1). 

(i) Algebraic function 

An algebraic function arises when attempting to form the inverse of a rational 
function. The function/ = +y/x for x > provides a typical example here. 
More complicated examples are the functions: 

y = v 2/3 y — x 2 _|_ 2y'x — 1 y = x V-*7(2 — x). 

More precisely, we shall call the function y = f(x) algebraic if it may be 
transformed into a polynomial involving the two variables x and y, the 
highest powers of x and y both being greater than unity. This criterion may 
easily be applied to any of the above examples. In the case of the last example, 
a simple calculation soon shows that it is equivalent to the polynomial 
2y 2 — 2xy 2 — x z = 0, which is of degree 2 in y and 3 in x. 

(j) Transcendental function 

A function is said to be transcendental if it is not algebraic. A simple example 
is y = x + sin x, which is defined for all x but is obviously not algebraic. 

(k) The function [x] 

On occasions when working with quantities that may only assume integral 
values it is useful to write y = [x] with the meaning that we assign to every 
real number x the greatest integer y that is less than or equal to it. Thus, for 
example, we have [-3] = -3, [-1.3] = -2, [0] =0, [0 . 92] = 0, 
H = 3, and [17] = 17. 

2-4 Digression on mappings 

Having now examined in some detail specific examples of functions providing 
one-one and many-one mappings, it will be helpful to take a slightly more 
general look at the notion of a mapping. We again appeal to the Venn 
diagram, but this time supplement it by the addition of arrows to suggest the 
form of mapping that is involved. 

In Fig. 2-11 pairs of closed curves have been used to represent the sets 
A and B postulated in the formulation of the more general definition of a 
function/ given in Definition 2-1. Once again points inside a curve represent 



SEC 2-4 



DIGRESSION ON MAPPINGS / 59 



elements in the set; with set A representing the domain of the function/and 
set B the range of/. The arrows relating sets A and B in the three pairs of 
diagrams are then self-explanatory when taken in conjunction with the 
captions. 





Fig. 2-11 Mappings: (a) B = /(/)), a one-many mapping; (b) B = f(A), a many-one 
mapping; (c) B —f(A), a one-one mapping. 

The mappings illustrated in Fig. 2T1 are often said to be onto mappings, 
in the sense that the set A is mapped by function/onto the entirety of set B. 
Thus, in each case, every element in B is associated with at least one element 
in A. Naturally if some set C containing B is considered in place of B, then 
there will be elements of C that are not associated with any element in A. The 
mapping of A into C by /is then said to be an into mapping. 

For example, if the function concerned is y = x 2 , then it maps the set A 
comprising the interval [1, 2] into the set C comprising the interval [1,9], 
but onto the set B comprising the interval [1,4]. 

These ideas are of real importance when a double mapping is involved, 
for then it is necessary to examine the relationship that exists between the 
range of the first function and the domain of definition of the second. If the 
first mapping is by a function /and the second mapping is by a function g, 
then the result of the successive mappings is called the composition of /and g 
and is usually denoted by f g. The order implies that/is the first mapping 
which is then followed by g. Using perhaps more familiar terminology and 
notation we are speaking here of the 'function of a function' g{f(x)}. 

The general ideas involved here are illustrated in Fig. 2-12. There (a) and 
(b) indicate the respective domains and ranges of/ and g whilst (c) indicates 
how, in general, the function f g has for its domain only part of A and for 
its range only part of B. 



60 / VARIABLES, FUNCTIONS, AND MAPPINGS 



CH 2 




Domain off 



(a) 



Domain of fog 



Range off Domain of g 



(b) 





GfSD 



/w 



(c) 



Elements common to range 
of/ and domain of g 



Range of g 




g{Aa)} 



Fig. 212 (a) Mapping by /of A onto B; (b) mapping by g of C onto D; (c) com- 
position of/ g. 



The symbolic representation suggested in Fig. 212 can be made more 
meaningful by considering the following. Let f(x) = 3x + 1 with domain 
(- co, 4/3] and g(x) = + V(9 - x) with domain [1, 9]. Then the range off 
is (— co, 5] and the range of g is [0, 2V2]. The range of/ thus only coincides 
with the domain of g in the interval [1, 5]. Hence the part of the domain of g 
that is common to the range off is a one-one mapping by /of the interval 
[0, 4/3]. This interval must then be the domain of f g. Next, the function g 
maps [1,5] onto the interval [2, 2-\/2], which must be the range of f g. 
Thus we have obtained the following: 



Domain off: 
Range off: 
Domain of g: 
Range of g: 



(-00,4/3], 
(-oo,5], 
[1,9], 
[0, 2V2], 



Domain of f og : [0,4/3], 
Range of f og : [2,2^2]. 



SEC 2-5 



CURVES AND PARAMETERS / 61 



Using direct algebraic substitution we see that in fact if/(.v) = 3x + 1 and 
g(x) = + V(9 - x\ then/ g = g{f(x)} = y/[9 - (3x + 1)] = + V(8 - 3x). 
This confirms directly that f g maps [0,4/3] onto [2, 2\/2], but does not 
take explicit account of the effect of the domain of g on the mapping. 

2 - 5 Curves and parameters 

A parameter a may be associated with a curve in two quite different ways. 
In the first situation we shall discuss, the parameter a occurs as a constant 
in the equation describing the curve. Thus changing the value of a will change 
the curve that is described. This simple idea underlies the geometrical concept 
of an envelope, which will be taken up again later in connection with differen- 
tiation and with differential equations. 

In the second situation, a will appear as a variable associated with two 
functions s(ct.) and ?(a), which will describe separately the x and y coordinates 
of points on any unbroken curve. This use of a parameter is called the 
parameterization of a curve and is an alternative method of representing the 
equation of the curve. 

(a) Envelopes 

This situation is best explained by means of an example. Considerthe equation 



(x — a) 2 + y 2 = - 



+ a 2 



which in this form is easily seen to describe a circle of radius |a|/-v/(l + a 2 ) 
with its centre on the x-axis at the point x = a. Obviously, changing a will 
both move the centre of the circle and alter its radius, as shown in Fig. 2T3. 




Fig. 2- 13 Envelope shown as dotted line. 



62 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

If a is allowed to vary in some interval, then the single equation will 
describe a set of circles, each one corresponding to a different value assumed 
by a in that interval. Collectively these circles are a family of circles with 
parameter a. If a curve exists that is tangent to every member of a family of 
curves, but is not itself a member of the family, then it is an envelope of the 
family. An envelope can be a curve of infinite length or on occasions it may 
reduce either to a curve of finite length or, in degenerate cases, to a single 
point. 

In Fig. 2-13 the envelope is shown as a dotted curve and, as would be 
expected in this case, the envelope is symmetrical about both the x- and j>-axes. 

If the family of circles that led to this envelope is written in the form 

(x - a) 2 + y 2 - ^— t = 0, 
1 + v. 2 

then it is seen to be a special case of an equation in three variables having the 
general form 

f(x,y,x) = 0. (2-3) 

This is the standard form for an equation defining a family of curves with 
parameter a and it will be used later to determine the equation of the envelope 
when it exists. 

However, it is easy to see that a family of curves does not always have an 
envelope associated with it, since the concentric circles x 2 + y 2 = a 2 form a 
perfectly good family with parameter a, but clearly there is no line that is 
tangent to each circle in the family. 

Expression (23) is an implicit representation of a function in the sense 
that it is not directly obvious how and when it is possible to re-express it in 
the more familiar explicit form 

y = F(x, a). (2-4) 

(b) Parameterization of a curve 

We have seen that when a curve is represented by an explicit equation of the 
form y = /(x), then for inversion reasons the mapping must be one-one. 
In other words, either/must be strictly monotonic in its domain of definition 
or, if not, it must be expressible piecewise as a set of new functions which are 
strictly monotonic on suitably chosen domains. 

A more general representation of a curve that overcomes the necessity 
for sub-division of the domain, and even allows curves with loops, may be 
achieved by the introduction of the notion of parametric representation of a 
curve. The idea here is simple and is that instead of considering x and y 
to be directly related by some function/, we instead consider x and y separ- 
ately to be functions of the variable parameter a. Thus we arrive at the pair of 
equations 



SEC 2-5 



CURVES AND PARAMETERS / 63 



x = s (a.) y = /(a), (2-5) 

with a < a < 6, say, which together define a curve. For any value of a in 
[a, b] we can use these equations to determine unique values of x and y, 
and hence to plot a single point on the curve represented parametrically by 
Eqn (2-5). The set of all points described by Eqn (2-5) then defines a curve. 
As a simple example of a curve without loops we may consider the 
parametric equations 



y = a 2 



for — oo < a < oo. 



These obviously define a parabola that lies in the upper half plane and is 
symmetrical about the j-axis with its vertex passing through the origin. 
Elimination of a is easy here and results in the explicit representation y = x 2 . 
In more complicated cases the parameter cannot usually be eliminated and, 
indeed, this should not be expected since parametric representation is more 
general than explicit representation. 

An important consequence of the parametric representation of a curve is 
that increasing the value of the parameter defines a sense of direction along 
the curve which is often very useful in more advanced applications of these 
ideas. An example of a curve containing a loop is provided by the parametric 
equations 



y3 _ 



for 



y = 4 - a 2 

which is shown in Fig. 2-14 together with the sense of direction defined by 
increasing a. 

"y 



-6 



1! 

2. 



Fig. 2- 14 Parameterization of a curve denning sense of direction. 



64 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 

It is implicit in the concept of the parametric representation of a curve 
that a given curve may be parameterized in more than one way.- Hence 
changing the variable in a parameterization will give a different parametric 
representation of the same curve. Thus if in the example above we replace 
the parameter a by the parameter ft using the relationship <x = /? + 1, then 
it is readily seen that 

x = £3 + 3/S 2 + 2$ y = 3 -2^-/32 for-3<£^l. 
This is an alternative parameterization of the same curve shown in Fig. 2-14. 

2-6 Functions of several real variables 

In physical situations, to say that a quantity depends only on one other 
quantity is usually a gross oversimplification. Indeed, this was so in the 
thermodynamic illustration used to introduce the notion of a function of one 
real variable, because we insisted on maintaining a constant volume of gas. 
In general the pressure/) of a given gas will depend on both its temperature T 
and its volume v. Here we would say that there was a functional relationship 
between p, T, and v which, in an implicit form, may be expressed by the 
equation 

f(p, T, v) = 0. (2-6) 

The function/occurring here is a. function of three real variables and obviously 
depends for its form on the particular gas involved. 

Usually one of the three quantities, say p, is regarded as a dependent 
variable with the others, namely T and v, being regarded as independent 
variables. Solving Eqn (2-6) forp then gives rise to an explicit expression of 
the form 

P = g(T, v), (2-7) 

with g then being called a function of two real variables. 

Just as with a function of a single real variable, in addition to specifying 
the functional form it is also necessary to stipulate the domain of definition 
of the function. Thus Eqn (2-7), which in thermodynamic terms would be 
called the equation of state of the gas, would only be valid for some range of 
temperature and volume. In this case the reason for the restriction on the 
temperature and volume is a physical one, whereas in other situations it is 
likely to be a purely mathematical one. 

Extending the ideas already introduced we shall now let R 2 denote the 
set of all ordered pairs (x, y) of real numbers and let S be some subset of R 2 . 

definition 2-3 We say /is a real valued function of the real variables 
x and y defined in set S if, for every (x, y) e S, there is defined a real number 
denoted by f(x,y). 



SEC 2-6 FUNCTIONS OF SEVERAL REAL VARIABLES / 65 

As is the case with a function of one variable, when the domain of defini- 
tion of a real valued function of two or more real variables is not specified 
it is to be understood to be the largest possible domain of definition that can 
be defined. Thus, for example, the largest subset S <= R2 in which the function 
f(x,y) = \/(l — x 2 — y 2 ) is defined is given by 

S = {(x,y)sR 2 \x 2 +y 2 < 1}. 

This concept of a function immediately extends to include functions of 
more than two variables. Using R" to denote the set of all ordered w-tuples 
(xi, X2, ■ ■ ., x n ) of real numbers of which S is some subset, this definition 
can be formulated. 

definition 2-4 We shall say that f is a real valued function of the real 
variables xi, X2, . . ., x n defined in set S if, for every (xi, xz, . . ., x n ) e S, 
there is defined a real number denoted byf(xi, xz, . . ., x n ). 

A typical example of a function of the three variables x, y, z is provided 
by f(x,y, z) = V(2 - x) + V(9 - y 2 ) + V(16 - z 4 ). The largest subset 
S <= R3 for which this function may be defined is obviously 

S = {(jc,j,z)eR3|x<2; -3<j<3; -2<z<2}. 

The geometrical idea underlying the graph of a function of a single variable 
also extends to real functions /of two real variables x, y. Denote the value of 
the function/at (x, y) by z, so that we may write z = f(x, y). Then with each 
point of the (x, j)-plane at which / is defined we have associated a third 
number z = f(x, y). Taking three mutually perpendicular straight lines with 
a common origin as axes, we may then identify two of the axes with the 
independent variables x and y and the third with the dependent variable z. 
The ordered number triples (x,y, z) = (x,y,f(x,y)) may then be plotted as 
points in a three-dimensional geometrical space. The set of points (x, y, z) 
corresponding to the domain of definition of the function/(x, y, z) then define 
a surface which, in practice, usually turns out to be smooth. It is conventional 
to plot z vertically. 

On account of the geometrical representation just described, even in R n 
it is customary to speak of the ordered «-tuple of numbers (xi, X2, ■ ■ ., x n ) 
as defining a 'point' in the 'space' R™. 

By way of illustration of a graph of a function of two variables we now 
consider 

x 2 v 2 x 2 v 2 

A*.y) = -4+j with -4+j^ 2 > 

where the inequality serves to define a domain of definition for the function. 
The surface described by this function has the equation z = x 2 /4 + y 2 /9 
and the domain of definition is the interior and boundary of the curve 



Cross-section by plane x = b 



Cross-section by plane y = a 



Cross section by z = 1 




Fig. 2- 15 Surfaces and level curves: (a) representation of surface; (b) level curves. 



PROBLEMS / 67 

x 2 /4 + y 2 /9 = 2. If this latter expression is rewritten in the form 
x 2 /8 + j 2 / 18 = 1 then it can be seen that the domain of definition of/ is in 
fact the interior of an ellipse in the (x, j)-plane having semi-minor axis 
2V2 and semi-major axis 3-\/2, and being centred on the origin. As f(x,y) is 
an essentially positive quantity it follows directly that < z < 2 in the domain 
of/. 

To deduce the form of the surface, two further geometrical concepts are 
helpful. The first is the notion of the curve defined by taking a cross-section 
of the surface parallel to the z-axis. The second is the notion of a contour 
line or level curve, defined by taking a cross-section of the surface perpendicular 
to the z-axis. 

To examine a cross-section of the surface by the plane y = a, say, we 
need only set y = a in f(x,y) to obtain z = x 2 /4 + a 2 /9, showing that the 
curve so defined is a parabola with vertex at a height z = a 2 /9 above the 
j-axis. A similar cross-section by the plane x = b shows that the curve so 
defined is z = b 2 /4 + _y 2 /9, which is also a parabola, but this time with its 
vertex at a height z = b 2 /4 above the x-axis. (See Fig. 2-15 (a).) If desired, 
sections by other planes parallel to the z-axis may also be used to assist 
visualization of the surface. 

The curve defined by a section of the surface resulting from a cross-section 
taken perpendicular to the z-axis is called a contour line or level curve by 
direct analogy with cartography, where such lines are drawn on a map to 
show contours of constant altitude. Level curves are obtained by determining 
the curves in the (x, jO-plane for which z = constant, and it is customary to 
draw them all on one graph in the (x, j)-plane with the appropriate value of 
z shown against each curve. (See Fig. 2T5 (b).) 

Let us determine the level curve in our example corresponding to z = \ 
which is representative of z in the range < z < 2. We must thus find the 
curve with the equation x 2 /4 + j 2 /9 = J, which we choose to rewrite in the 
standard form x 2 /2 + y 2 1(9/2) = 1. This shows that it describes an ellipse 
centred on the origin with semi-minor axis \/2 and semi-major axis "Sy/2. 
It is not difficult to see that all the level curves are ellipses; the one corres- 
ponding to z = 2 being the boundary of the domain of/ and the one corres- 
ponding to z = degenerating to the single point at the origin. 



PROBLEMS 

Section 2-1 

21 Sketch the graphs of these functions: 

(a) f{x) = x 2 ~ 3x + 2 (-l<x<3); 

(b) f(x) = x + sin x (- -n/2 < x < w/2); 
(c) /(*) = x 3 (-2<*<2); 

(d) fix) = x 2 + 1/x (0-2<x<2); 

(e) fix) = x + 1/x 2 (0-5 < x < 5). 



68 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 



2-2 Determine the domain and the range of each of functions (a) to (e) denned 
above. 

2-3 Determine the range of the function /(x) = x 2 + 1 corresponding to each of 
the following domains and state when the mapping is one-one and when it is 
many-one: 
(a) [-1,1]; (b) (2,4); (c) [-2,4]; (d) [-3,1]. 

2-4 Find the largest domain of definition for each of the following functions : 
(a) /(*) = x 3 + 3 ; (b) /(*) = x 2 + VO - x*) ; 

(c) fix) = x 2 + Vd - * 3 ); (d) /(*) = l/(* 2 - 1); 

(e) fix) = * + 1/x; (0 fix) = x 2 /(l + x 2 ). 

2-5 Let fin) denote the function that assigns to any positive integer n the number of 
positive integers whose square is less than or equal to n + 2. By enumerating 
the first few values of fin) deduce the values of n for which /(«) = 3. 

2-6 An integer m is said to be a prime number if its only factors are 1 and m. 
Given that/(«) is the function that associates with n the number of primes less 
than or equal to 2w + 1, enumerate the first ten values of fin). 

2-7 Give two examples of functions which are defined only for discrete values of 
the dependent and independent variables. 

2-8 Sketch representative members of the two pencils of lines described by 
y = a (x — 1) + 2 and y = Pix — 2) + 3, where a and p are parameters. 
Locate the two singular points and suggest how a and p may be used as co- 
ordinates for points in the plane of the two pencils. When will the coordinates 
a and p fail to identify points ? 

2-9 Suppose that /is the function that assigns to every qualified driver the name of 
the driving examiner who issued his licence. Identify the domain A and the 
range B off, stating the nature of the mapping involved. 

2-10 Give two examples of functions relating non-numerical quantities. 

Section 2-2 

211 Sketch the graphs of the following functions in their stated domains of 
definition and in each case use the process of reflection in the line y — x to 
construct the graph of the inverse function : 

(a) fix) = x 3 with x e [-2, 2]; 

(b) f{x) = x + sin x with x e [0, w/2] ; 

(c) f(x) = x/(l + x 2 ) with x e {- 1, 2]. 

2-12 Where appropriate, classify the following functions as either monotonic or 
strictly monotonic increasing or decreasing on the stated domains of defini- 
tion: 
i&) fix) = x 2 for xe[-l,2\; 

(b) fix) = x 2 forxe[-l,0); 

(c) fix) = sin x for x e [-3 jt/4, w/4] ; 

(d) fix) = cos x for x e [0, *] ; 

(e) fix) = tan x for x e [- w/4, tt/4] ; 

txfor xe[0, 1] 
if) fix) = lforxe(l,2] 

U 2 /4 for x e (2, 6]; 



PROBLEMS / 69 



,,,,.. txforxe [1,2) 
(g)/W = | x 2 forxe [2,4]. 

2-13 Complete the entries in this table: 



f , _. Is mapping f' 1 when it 

■^ ' one-one exists 



X 


[-3,1] 


X 3 




1/(1 + X) 


[1,3] 


sin x 


[-K W 


cos (a: + Jn) 


[O.ir] 


tan [x — \tt\ 


[0, \n\ 



(2,4] 



Section 23 

2-14 Sketch these functions in their associated domains of definition: 
(a)/(x) = |2x|forxe[-2,2]; 

(b) f(x) = x + | x | for x £ [-2, 2]; 

(c) the step function assuming the values 1, 2, —3, 2, 4 on the x intervals 
[0, 1), [1, 2], (2, 3-5), [3-5, 4], and (4, 5], respectively. Identify end points 
belonging to a line by a dot and end points deleted from a line by a circle. 

x | for x 6 [0, 1) 
x- 1 | for xe [1,2) 
x-2|forxe[2,3]. 



(d) fix) = 



2-15 Where appropriate, classify the following functions as even or odd: 
(a)/(x) = x + |oc|; 

(b) /(x) = x + sin 2x; 

(c) f[x) = x 2 + sin x; 

(d) /(x) = 1/x; 
(e)/(x) = x 2 /(l+x 2 ) 2 ; 

(f) /(*) = x 5 - x 3 + x; 

(g) fix) = 2 cos x + sin x. 

It is obvious that any arbitrary function /(x) which is defined in an interval 
^ containing the origin may be written in the form 

fix) = iifix) +fi~x)) + HJ{x) -fi-x)), 
in any interval,/ <= J that is symmetric about the origin. Such an interval,/" 
is said to be interior ioJ. This shows that any such/(x) is expressible as the 
sum of an even function K/(x) + /( - *)), and an odd function £(/(*) - /( - *)) 
within ^ Apply this result to display the following functions as the sum of 
even and odd parts, in each case stating the largest interval ,/ for which the 
result is true: 
(h) f(x) = 1 + x 3 + x siri x for — 2^ < x < 3*-; 

(i) fix) = 1 + x + | x | sin x for — 3*- < x < 3»; 

(j) /(*) = 1 - x + 2x 2 + 4x 3 for -4 < x < 3. 

2-16 Determine if upper and lower bounds exist for the following functions and, 
when appropriate, state their values and where they occur on the respective 
domains of definition: 



70-/ VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 



(a)/(x) = l/xforxe[l,4]; 
(b)/(x)= l/xforjce(0,3]; 
(c)/(x)= 1 + x 2 for xe [-2,1]; 

(d) f(x) = sin x for x e [0, 3 tt/2] ; 

(e) /(x) = tan x for x e (- n/2, tt/2). 

2-17 The pairs of numbers enclosed by the curly brackets following each problem 
are upper and lower bounds for the associated function in its stated domain 
of definition. State whether or not each of these bounds is strict : 

(a) /fr) = x 3 + x + 1 with x e [1, 2], {0, 11}; 

(b) f{x) = sin x with x e [0, tt/2], {0, 2}; 

(c) f(x) = 1/(1 + x 2 ) with x e [0, 2], {1/6, 2}; 

(d) /(x) = sin (1/*) with x e [21*, 30], {0, 1}. 

2- 18 Determine by sketching whether the following functions are convex, concave 
or neither on their stated domains of definition : 
(a) f{x) = x 3 forxe[l,3]; 
(b)f(x) = x 3 forxe[-l, 1]; 

(c) f(x) = a 2 - x 2 for x e [-a/2, a]; 

(d) /(x) = x + sin x for x e [0, tt/2] ; 

(e) f(x) = sin * for x e [0, w]. 

2-19 Give examples of polynomials of degrees 3, 4, and 5 and of a rational function 
having a numerator of degree 2 and a denominator of degree 5. 

2-20 Classify the following functions as polynomial, rational, or algebraic. When 
the function is algebraic, state the degrees of x and y in the polynomial that is 
involved after the surds and fractions have been cleared : 

(a) y = x 3 — x 2 + 1 ; 

(b) j = xV(3-x); 

(c) y = {x - l)/(*4 + 3x 3 - x 2 + x + 1); 

(d) y = x + 3V{x 2 - 2); 

(e) y = (x 3 - 3x + 2)/(x - 1). 

Section 2-4 

2-21 Complete the entries in the following table by determining whether the 
functions /map the stated domains A 'into' or 'onto' the domains B. 

r a d Into or onto 

mapping 



X 3 


[1,3] 


[0, 30] 


x + sin x 


[0, in] 


[0, *(2 + »)] 


X 2 


0,4] 


[1,16] 


x i 


[-1,2] 


[0, 16] 



2-22 Given that f(x) = 2x - 7 with domain (- oo, 20] and g(x) = 10 - x with 
domain [—6, oo), determine the domain and the range of the composition 

f°s- 

2-23 Given that/(x) = x-+ 1 with domain (-oo, oo) and g(x) = 2 + V(4 — x) 
with domain [—5, 4], determine the domain and the range of the composition 
f°g- 



PROBLEMS / 71 



Section 2-5 

2-24 Draw the circles corresponding to a = |, J, \, 1, and 2 in the equation 
(x — l) 2 + (y — a) 2 = a 2 /(l + a 2 ) and sketch the envelope indicating its 
asymptotes for large positive and negative a. 

2-25 Draw the circles corresponding to a = J, J, 1, 2, and 3 in the equation 
(x — a) 2 + j 2 = I a 2 and draw the envelope. 

2-26 Deduce the envelope of the family of circles (x — a) 2 + y 2 = a 2 , with 
parameter a. 

2-27 Sketch representative ellipses belonging to the family * 2 /a 2 + j 2 /(4 — a) 2 = 1 , 
with parameter a and deduce the shape of the envelope. 

2-28 Draw representative members of the family of straight lines y = <xx + 2/a, 
with parameter a, and deduce the shape of the envelope. 

2-29 Sketch the curve represented by the parametric equations x = 2 cos a, 

y = sin a for — n/2 < a < n/2. 

2-30 Sketch the curve represented by the parametric equations x = a 2 + 1, y = a 3 
for — 2 < a < 2. 

2-31 Sketch the curve represented by the parametric equations x = a 3 + a 2 — 2a, 
y = 5 — a 2 for — 3 < a < 2. Indicate by arrows on the curve the sense of 
direction corresponding to increasing a. 

2-32 Sketch the curve represented by the parametric equations x = cos a + 
4 cos (a/3), y = sin a + 4 sin (a/3) for < a < 3 nj2. Use arguments in- 
volving even and odd functions to deduce the form taken by the curve for 
< a < 6tt. 

2-33 Suggest two different parametric representations for the curve y = x 2 + x + 1 
for < x < 2. 

Section 2-6 

2-34 What are the largest domains of definition for the following functions of 
several variables: 

(a)f(x,y) = 1 +x 2 + y 2 ; 

(b) f(x,y) = (x 2 + F W0 - x 2 - y 2 ); 

(c) f{x, y) = sin xyj(x 2 + y 2 + 1); 

(d) f(x, y) = 3* 2 + y 2 + V(2 -y)+ V(4 - x 2 ); 

(e) f(x,y, z) = V(3 - x) + xV(9 - y) + yV(l - z 2 ); 

(f) f(x,y, z) = V(x 2 + y 2 - 1) + V(4 - x 2 - y 2 - z 2 ). 

2-35 The function f{x, y) = x 2 y has for its domain of definition the rectangle in 
the {x, j)-plane defined by | x \ < 3, | y \ < 2. Deduce the shape of the curves 
defined by cross-sections of the surface z = f{x, y) taken by the three planes 
x = -2, x = 0, and x = 2 that are parallel to the (y, z)-axes and by the three 
planes y = —2,y = 0, and y = 2 that are parallel to the (x, z)-axes, using 
your results to sketch the surface. Sketch on one diagram the level curves 
corresponding to z = — 4, z = — 2, z = 0, and z = 6. 

2-36 Sketch the surface z = f(x,y) defined by the function f(x,y) = 1/(1 + x 2 
+ y 2 ) in the domain | x j < 4, | y | < 4. Draw the level curves corresponding 
to z = 1/9, z = 1/3, z = 2/3, and z = 1. 



72 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 



2-37 The surface z = f(x, y) is defined by the function f(x, y) = \j[{x — l) 2 
+ (y- 2) 2 - 1] with 2 < (x - l) 2 + {y - 2) 2 < 9. Deduce the domain of 
definition of the function and then sketch the level curves corresponding to 
z = i, 2 = i, and z = f on the same diagram. Use your result to sketch the 
surface. [Hint: Use the fact that the circle of radius p with centre at (a, b) has 
the equation (x — a) 2 + {y — b) 2 = p 2 .] 



Sequences, limits, and 
continuity 



3-1 Sequences 

The notion of a 'sequence' is a constantly recurring one in everyday life, 
where it usually implies the ordering of some set of events with respect to 
time. The sets of events that are so ordered, or arranged, are very varied and 
may be either numerical or non-numerical in nature. Typical examples of 
commonplace sequences in these categories are these: 

(a) the sequence of months in a year ; 

(b) the sequence of digits identifying a telephone subscriber; 

(c) the sequence of machining operations required to make a certain 
component. 

However, sequences are not necessarily decided by the chronological 
order of events and they are often determined instead by some attribute 
possessed by the members of the set to be ordered. Thus, for example, two 
commonly occurring sequences to be found in any library are the entries in the 
alphabetic catalogues of authors and titles, neither of which are in the 
chronological order of acquisition of the books. Although these general ideas 
could be discussed at greater length, such an examination is inappropriate 
here, and it must suffice that these few examples show that sequences are 
commonplace in the world around us, and that they need not necessarily 
involve numbers. 

These ideas find an immediate parallel in mathematics, where the natural 
order existing in R combined with the arithmetic properties discussed in 
Chapter 1 enables us to deal very successfully and in great detail with ques- 
tions relating to mathematical sequences. Our main pre-occupation in this 
book will be with sequences of numbers and sequences of functions so we 
must first make the mathematical notion of a sequence more precise. Before 
doing this however we must first issue a word of warning concerning the 
colloquial usage of the words sequence and series, and on their mathematical 
usage which is quite different. Colloquially the words sequence and series are 
often used interchangeably, but in mathematics they have two quite different 
meanings which must never be confused. In brief, in mathematical terms a 
sequence is a set of quantities that is enumerated in a definite order, whereas a 
series involves the sum of a set of quantities. Thus 1, 3, 5, 7, 9, . . . is a 
sequence but 1 + \ + i + i + re + • • • is a series. 

If a sequence is composed of elements or terms u belonging to some set S, 



74 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

then it is conventional to indicate their order by adding a numerical suffix 
to each term. Consecutive terms in the sequence are usually numbered 
sequentially, starting from unity, so that the first few terms of a sequence 
involving u would be denoted by m, u 2 , u 3 , . . .. Rather than write out a 
number of terms in this manner this sequence is often represented by {u n }, 
where u„ is the nth term of the sequence. The sequence depends on the set 
chosen for S and the way suffixes are allocated to elements of S. A sequence 
will be said to be infinite or finite according as the number of terms it contains 
is infinite or finite and, unless explicitly stated, all sequences will be assumed 
to be infinite. The notation for a sequence is often modified to {u n }% =1 when 
only a finite number N of terms is involved. 

As an example of an infinite numerical sequence, let S be the set of real 
numbers and the rule by which suffixes are allocated be that to each integer 
suffix n we allocate the number 1/2" which belongs to R. We thus arrive at the 
infinite sequence u\ = 1/2, w 2 = 1/2 2 , w 3 = 1/2 3 , . . ., which could either be 
written in the form 

111 11 



2 2 2 2 3 1 n 2 n+1 

or, more concisely, in the form 

{l/2»}. 

Had the set S still been the set R of real numbers, but the rule of allocation 
of suffixes been changed, so that to each integer suffix n chosen from the first 
N natural numbers we allocated the number l/(2« + 1), then the finite 
sequence 

111 1 



3 5 7 (2JV + 1) 

would have resulted. 

If we use the notion of a function /(x) which is defined only for integral 
values of the argument x, the following concise definition can be formulated. 

definition 3-1 In mathematical terms a sequence is a function / defined 
only for integer values of its argument and having for its range an arbitrary 
sets'. 

Hence the first sequence that was displayed could be regarded as resulting 
from the function f(x) = \J2 X with u n =/(«), where n is always a positive 
integer. By exactly similar reasoning, the second sequence can be derived 
from the function f(x) = 1/(2* + 1) by setting u n = /(«)• 

The connection between functions and sequences that is established in 
this definition makes it appropriate to describe numerical sequences in the 
same terms as would be used to describe the function giving rise to them. 



SEC 3-1 SEQUENCES / 75 

Thus if the terms of a sequence {«„} are such that m <u n < M for all values 
of n then the sequence is said to be bounded, whilst if u n +\ > u n for all « 
then the sequence is said to be strictly monotonic increasing. The terms bounded 
above, bounded below, unbounded, strictly monotonic decreasing, monotonic, 
and oscillating, etc., can also be used in the obvious manner as shown below. 

Example 3-1 

(a) {l/n}f is a bounded, strictly monotonic decreasing sequence. 

The upper bound 1 is strict but the lower bound is 
never actually attained. 

(b) ( 1 \°° is a strictly monotonic increasing sequence, strictly 
\sin (\ln)] 1 bounded below by (sin l)" 1 but unbounded above. 

(c) /(— 1)»)°° is a bounded sequence with strict upper bound J and 
\ ~ n J strict lower bound —1. 

(d) {«„}" where W2m-i = m\{tn + 1) and uzm = «2m-i- The first 

six terms of this sequence are |, f, f, §, f, | correspond- 
ing pairwise, respectively, to m = 1, 2, and 3. The 
sequence is thus both bounded and monotonic in- 
creasing. It is not strictly monotonic increasing because 
pairs of terms are equal. The lower bound \ is strict, 
but the upper bound 1 is never actually attained. 

(e) {(— l) n } is an oscillating but bounded sequence with strict 

upper bound 1 and strict lower bound — 1 . 

(f) {(—2)"} is an oscillating but unbounded sequence. 

Just as a graph proved to be useful when representing functions, so also 
may it be used to represent sequences. Exactly the same method of repre- 
sentation can be adopted, but this time, since the domain of the function 
denning the sequence is the set of natural numbers, the graph of a sequence 
will be a set of isolated points. A typical example is the graph of the first 
few terms of the sequence {u n } with u n — [n + (— 1)»]/« which are shown as 
dots in Fig. 3-1 (a). 

An obvious deficiency of this representation is that the horizontal axis 
must be made unreasonably long if a large number of terms are to be repre- 
sented. This can be overcome by the following simple device which is some- 
times of use since it compresses the' representation of numbers 1 to infinity 
onto a line of finite length. The idea is illustrated in Fig. 3-1 (b) where, on the 
horizontal axis, the integer n is associated with a point distant \jn to the left 
of a fixed point P. The left end point of the line segment is then associated 
with the value 1, the mid-point with the value 2, and so on, with the point P 
itself corresponding to an infinite value of n. 

An even simpler graphical representation than either of these is often 



76 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



used in which the values of successive terms in the sequence are plotted one- 
dimensionally as points on a straight line relative to some fixed origin. 
Because of the identification of the numerical value of a term of the sequence 
with a point on a line, the behaviour of a sequence is often spoken of in terms 
of the behaviour of the points in this representation (that is-, there is a one-one 
mapping of {u„} onto the straight line). In terms of this representation, the same 



1-5-1 



10- 



0-5- ! 



1 2 3 4 5 6 7 



1-5-1 



10 



0-5 



\l-< 



1/2 



1/3 



1/4- 
1/5 
-1/6 — 

Fig. 3-1 Two alternative graphs of sequence jl H j: (a) normal graph; 

(b) compressed horizontal axis. 



SEC 3-1 SEQUENCES / 77 

All points u, for n > 5 
lie in this neighbourhood 



u 



M 3 " 5 l u i M» " 6 \u 

t • • { ■ • • ••^••••» • •}« 

0-5 a \ i-o k 



> • }• • 

6 : 5 i,\ T¥' 4 1-5 



/. , (-1)" 



Fig. 3-2 Sequence 1 + j plotted on line 



sequence that gave rise to Fig. 3-1 (a) and (b) will appear as in Fig. 3-2. This 
could also have been obtained from Fig. 3-1 (a) and (b) by projecting the 
points of the graphs horizontally across to meet the vertical axis. 

In each of these three representations, the tendency for the points of the 
sequence {1 + (— l)"/w} to cluster around the value unity as n increases is 
obvious and clearly expresses an important property possessed by the sequence. 
We shall now explore this more fully. 

In the sequence just discussed it is obvious that as n increases, so the 
points of the sequence cluster ever closer to the unit point in Fig. 3-2. If we 
adopt the convention of calling an open interval (a, b) containing some fixed 
point a neighbourhood of that point, then it is not difficult to see that any 
neighbourhood of the point unity will contain an infinite number of points 
of the sequence {u n }. In fact in this case we can assert that no matter how small 
the length b — a of the neighbourhood, there will always be an infinite number 
of points in (a, b) and there will always be a finite number of points outside 
(a, b). This is even true when b — a shrinks virtually to zero! 

The fact that any neighbourhood of the value unity has the property that 
an infinite number of points of the sequence are contained within it, whereas 
only a finite number of points lie without it, is recognized by saying that the 
limit of the sequence is unity. On account of this name the point corresponding 
to the value unity in Fig. 3-2 is called a limit point of the sequence. We shall 
examine the idea of a limit in the next section, and so for the moment will 
confine discussion to limit points. For this we shall require the notion of a 
sub-sequence. Henceforth, by a sub-sequence we shall mean a sequence 
u ni , w„ 2 , . . ., u nm , . . ., of terms belonging to the sequence {w„}, where 
mi, «2, . . ., n m , ... is some numerically ordered set of integers selected 
from the complete set of natural numbers. Thus ui, 1/9, H27, «3i, . . • is a sub- 
sequence of «i, «2, «3, • • • and obviously {u% ug, W27, M31, • • .} c {«»}• 

In terms of this we now give the following formal definition of a limit 
point of a sequence {««}. 

definition 3-2 A point u* is said to be a limit point of the sequence {u„} 
if every neighbourhood of u* contains an infinite number of elements of 
the sequence {u n }. 



78 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

Since we have not insisted that there be a finite number of points outside any 
neighbourhood of a limit point it follows that a sequence may have more than 
one limit point. We shall show by example that a limit point may or may not 
be a member of the sequence that defines it. This result when applied to 
sequences with only one limit point will later be seen to be very important, 
since it provides the justification for the approximation to irrational numbers 
in calculations by rational numbers. In sequences involving only one limit 
point the sequence will be said to converge to the value associated with the 
limit point. This value will be called the limit of the sequence. 

Not all sequences have limit points and the following examples exhibit 
sequences having three, one, and no limit points, respectively. 



{ sin C-^M 



Example 3-2 

(a) j c . n ( n 2 + l \_\ has the three limit points — 1, 0, and 1, of which 
is a member of the sequence and the other two are 
not. The sequence does not converge. 

(b) f 1 . (™\\ has only one limit point at zero which is a member 
(n \ 2 // of the sequence. The sequence converges to zero. 

(c) {« 2 } has no limit point and so the sequence does not 

converge. 

One of the most important applications of the notion of a sequence is to 
the study of series. The difficulty here is to give a meaning to the sum of an 
infinite number of terms. What, for example, is the meaning of 

v l 

2-,- (A) 

The solution is to be found in the behaviour of the sequence {s m } defined by 

m ] 
1 «! 

The first few terms of the sequence {s m } are 

s 1 = U *-l+j|. *-l + i + l. ,, = 1 + 1 + 1 + 1 

and obviously all such terms s m will only involve the sum of a finite number 
of numbers. For obvious reasons s m is called the mth partial sum of the series 
(A). The interpretation of the infinite sum (A) is to be found in the behaviour 
of the Mh term of {s m }, namely the Mh partial sum sn, as N tends to infinity. 
If {s m } has only one limit point at which s m tends to some number S, then this 
will be called the sum of the series. If S is infinite the series will be said to 



SEC 3-2 LIMITS OF SEQUENCES / 79 

diverge. A moment's reflection will show the reader that this is the practical 
approach to the problem, since the term s# is the sum of the first N terms of 
the infinite series (A), and it seems reasonable to assume that when the value 
of (A) is finite, it must be close to the value sn, when N is suitably large. 

These preliminary ideas on series must suffice for now, but we shall take 
them up again later and devise tests to determine whether series are convergent 
or divergent. 

3-2 Limits of sequences 

The term limit was first introduced intuitively in the previous section in con- 
nection with a sequence {u„} which had only one limit point. As n increases so 
the points representing the terms u„ cluster ever closer to the limit point 
whose value L, say, is the limit of the sequence. This idea of a limit is correct 
in spirit but it is not very satisfactory from the mathematical manipulative 
point of view since the phrase 'cluster ever closer to' is far too vague. The 
difficulty of making the expression 'limit' precise is connected with the exact 
meaning we give to this phrase. 

Our difficulty can be resolved if we recall that any neighbourhood of a 
limit point will contain an infinite number of points of the sequence and, 
if there is only one limit point, will exclude only a finite number of points! 
Thinking in terms of numbers rather than points, a neighbourhood of a limit 
point is simply an open interval of the line on which the numbers u n are 
plotted and we already have a notation for representing such an interval. 
Suppose, for convenience, that the neighbourhood is symmetrical about the 
number L and of width 2e, where e is some arbitrarily small positive number. 
Then a variable u will be inside this neighbourhood if L — e<u<L + s. 
Recalling the definition of 'absolute value', this inequality can be rewritten 
concisely as \u - L\ < s. Different values of e > determine different 
neighbourhoods, and if u is identified with the term u n of the sequence, then 
L is the limit of the sequence if, no matter how small e may become, only a 
finite number of terms u„ lie outside the neighbourhood and an infinite 
number lie within it. 

We can now give a proper definition of a limit. 

definition 3-3 The sequence {u n } will be said to tend to the limit L if, 
and only if, for any arbitrarily small positive number e, there exists an integer 
N such that 

n > N ^ \u n — L\ < e. 

Let us test our definition on the sequence {u n } with u„ = 1 + (— 1)»/«. 
We already know that this sequence has only one limit point at the value 
unity, and consequently our definition should show that the limit is unity. 
Suppose, for the sake of argument, that we check to see that the definition is 



80 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



satisfied if s = 1/100. To do this we must find a number N such that when 
n > N we have 



('+*?)- 



< 



1 

loo 



This result is obviously equivalent to the requirement that (l/n) < 1/100 
which will be true for any value of n greater than 100. Hence if we take 
N = \00 the conditions of the definition are satisfied. There are thus 100 
terms outside the neighbourhood and an infinite number within it. 

Had we demanded a much smaller value of e, say e = 10" 6 , the identical 
argument would have shown that the definition is satisfied if N = 10 6 . 
There would now be a very large number of terms outside the neighbourhood 
0-999999 < u n < 1-000001, in fact 10 6 in all, but this is still a finite number 
whereas the number of terms within the neighbourhood is still infinite. 
Clearly, however small the value of e, the conditions of the definition will still 
apply showing that it is in accord with our earlier intuitive ideas. 

In general, when the sequence {u n } has a limit L, so that we say it converges 
to L, we shall write 

lim u n = L. 



Whenever using this notation for a limit the reader must always keep in 
mind the underlying formal definition just given. 

The definition and the illustrative example just given show that when a 
sequence has only one limit point, then it must converge to the value associ- 
ated with that limit point. Any sequence such as {u n } with u n = sin {n(n 2 + l)/2n} 
cannot have a limit, for it has three limit points at — 1 , 0, and 1 and any small 
neighbourhood taken about any one must, of necessity, exclude the infinitely 
many terms associated with the other two. Such a sequence does not converge. 

Frequently the limit of a sequence is of more importance than its individual 
terms, and in such circumstances the notation lim u n is advantageous in that 

it focusses attention on the general term u„ of the sequence. The result of the 
limiting operation is often readily deduced from the general term as these 
examples indicate. 

Example 3-3 Determine the limits in each of the following: 
r(2« - 1)(« + 4)(n - 2)" 



(a) lim 

n— *oo 

(b) lim 

n— *-oo 

(c) lim 



•1 2 

1 + ~2 + * 

"5»+i + 7«+r 
5» _ 7» 



+ 



n - 1" 



SEC 3-2 LIMITS OF SEQUENCES / 81 

,^ r n + 22 + 32 + • • • + w2 1 

(d ) hm . 

n— ><x> L '* 

So/Mr/on (a) The general term is «„ = [{In - 1)(« + 4)(n - 2)]/« 3 , so that 
expanding the numerator and dividing by n z gives 

„ , 3 18 8 

u n = 2 + + — 

n n 2 n 3 

Obviously, as n increases, the last three terms comprising w„ approach zero, 
and in the limit we have 



lim \(2n-l)(n + 4)(n-2)l = ^ 



Solution (b) The general term is u n = [1 + 2 + ••• + („ _ l)]/ n 2 , in 
which the numerator is the sum of an arithmetic progression. Now it is 
readily verified that 1 + 2 + • • • + ( n - 1) = n (n - l)/2 so that 



m» 



-m 



Using the same argument as in (a) above we see at once that as n increases 
so u n approaches the value \, whence 



hm — H \- ■ • • A = — 



Solution (c) The general term here is u„ = (5 B+1 + 7 B+1 )/(5» — 7») and by 
dividing numerator and denominator by 7" it may be written: 

5(5/7)» + 7 



tin — 



(5/7)» - 1 



Now 5/7 < 1 so that (5/7)» will tend to zero as n increases. Thus u n will 
approach the value —7. In this case we may write 



lim P" +1 + 7 " +1 l = -7. 
n ^l 5»-7» J 



Solution (d) The general term is u n = [l a + 2 2 + • • • + «2]/«2 5 j n w hj c h 
the numerator is the sum of the squares of the first n natural numbers. Using 
the familiar result 

12 + 22 + ... + „ 2 = "("+0(2" + l) 

6 

enables us to write 



82 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

(« + 1)(2« + 1) 



U n 



6w 



It is obvious that the numerator is quadratic in n whereas the denominator 
is first degree or linear in n. Hence as n increases without bound, so will u„. 
This sequence diverges and we write 



lim 



12 + 2 2 + • • . + „2 



00. 



Notice that we do not use the equality sign in connection with the symbol 
oo, in accordance with the idea that infinity is not an actual number but 
essentially a limiting process. 

Before continuing our discussion of limits, let us introduce a useful 
notation. In the examples above it is apparent that the value of the limit of a 
sequence involving the ratio of two expressions as n increases, is entirely 
determined by the ratio of the most significant terms in the numerator and 
denominator. In the case of a polynomial involving «, the most significant 
term as n increases is obviously the highest degree term in which it appears. 
Thus in (a), an inspection of the brackets in the numerator shows the most 
significant term to be 2n 3 , and as the denominator only involves n 3 , it is at 
once obvious that for large n the ratio will approach (2n 3 jn 3 ) = 2. 

To streamline limiting arguments of this type, and yet to preserve some- 
thing of the effect of the less significant terms, we now introduce the so-called 
'big oh' notation appropriate to functions. 

definition 3-4 We say that function f{x) is of the order o/the function 
g{x), written /(x) = 0(g(x)) if, for some set of values of x 

(a) g(x) > 
and 

(b) |/(x)| < Mg(x), 
where M is some constant. 

The value of the constant M is usually unimportant as for most arguments 
it suffices that such an M should exist. We have these obvious results: 

2x 3 + 2x + 1 = 0(x 3 ), 
3x + sin x = 0{x), 
sin x = 0(1), 

where the symbol 0(1) has been used to denote a constant. 

In terms of this notation we may write the general term u„ in Example 3.3 (a) 
in the simplified form 



SEC 3-2 LIMITS OF SEQUENCES / 83 



2«3 + 0(„2) 0( - n 2) 
w» = — whence u„ = 2 H — ■ f A") 

By virtue of the definition of the symbol 'big oh', 0(n 2 ) implies an expression 
that is bounded above by Mn 2 , so that 0(n 2 )/n 3 ^> (Mn 2 )/n 3 . However, 
M/n -*■ as n increases without bound, so that 

lim u n = 2. (B) 

Normally the argument just outlined would be omitted, so that result (B) 
would be written down immediately after (A). 

Implicit in the examples just examined are results which we now combine. 

theorem 3-1 If it can be shown that m, m, us, . . . and vi, v%, vs, . . . 
are two sequences such that lim u n = L and lim v n = M, then 

n— »- co n-* co 

(a) mi + vi, uz + V2, us + v 3, . . . is a sequence such that 
lim (u n + v n ) = L + M; 

n— »co 

(b) mvi, U2V2, U3V3, ... is a sequence such that lim u n v n = LM; 

M-*co 

(c) provided M ^ 0, ui/vi, U2/v 2 , mjvs, ... is a sequence such that 

lim (u n jv n ) = LIM. 

n->co 

These assertions are virtually self-evident and so we prove only the first 
result, making full use of our definition of a limit and of the triangle inequality 
of Theorem 1-4. 

Suppose e is given. Then because {«„} converges to the limit L, there 
exists a number Ni such that n > Ni => \u n — L\ < \e. By the same argu- 
ment there exists another number N2 such that n > jV 2 => \v n — M\ < fe. 

NOW \{u n + V n ) - (L + M)\ = \{u n -L) + (v n - M)\ < \u n - L\ + \v n - M\, 

and so n > max (Ni, N 2 ) => \{u n + v n ) - (L + M)\ <\e + \e. Thus, taking 
N = max (Ni, Nz), and given an arbitrarily small positive number e, we have 

n> N=> \(u„ + v n ) - (L + M)\ < e 
or 

lim (u„ + v n ) = L + M. 

n— *-co 

In effect, this theorem justifies any argument in which it is asserted that, 
if a is close to A and b is close to B, then a + b is close to A + B, ab is close 
to AB, and, provided b and B ^ 0, a/A is close to A/B. 

theorem 3-2 Let {«„} and {v„} be two sequences which both converge to 
the same limit L, and suppose {w n } to be a third sequence. Then if for all n 
greater than some fixed value N, it is true that u„ <: w n < v n , the sequence 
{w B } converges. Furthermore, the limit of the sequence {w n } is also L. 



84 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

The proof of this theorem is not difficult and so is left to the reader as an 
exercise. In essence it involves two stages. The first is to establish that 
{u n — w„} and {w n — v n } are both null sequences in the sense that they con- 
verge to the limit zero. The second involves the use of Theorem 3-1 (a) to 
establish that these two null sequences imply lim w n = L. 

In applications use of this theorem is often confined to proving that a 
given sequence {w n } converges, so that the sequences {u n } and {v„} then need 
to be devised to satisfy the conditions of the theorem. 

Example 3-4 Given that 

, 11 11 

>%=!+;- + -- + •••+ — — + 



2 2 2 2"- 1 3.2" 

use Theorem 3-2 to prove that the sequence {w n } converges and to find the 
limit. 

Now, obviously 

, 11 1 ,11 11 

1 +2 + 2 3 + --' + 2^ <Wre<1 + 2 + 2-i + ' , - + 2^ + 2-»' 

and so using the expression for the sum of a geometric progression we may 

write 

2[1 - (*)»] < w B < 2[1 -(i)» +1 ]. 

Thus for the sequence {u n } we take w n = 2[1 — (J)"] and for the sequence 
{v n } we take v n = 2[1 - (l) n + 1 ]. The conditions of the theorem are then 
satisfied, since lim u n = lim i\ = 2. Hence the sequence {w n } converges 

and has for its limit the value 2. 

At this stage in our discussion of sequences the following result should be 
self evident and we state it in the form of a postulate, rather than prove it. 

postulate Every increasing sequence which is bounded above tends to 
a limit. 

The proof of this postulate is outlined in Problem 3.20 at the end of the 
chapter. The details are left to the reader, together with the task of showing 
the consequence that every decreasing sequence which is bounded below must 
also tend to a limit. 

It is this postulate that validates the usual arithmetic procedure for finding 
a square root. In the procedure an additional digit is added to the approxima- 
tion at each stage, thereby giving rise to an increasing sequence that is 
bounded above. With a number such as \/2 which we know to be irrational, 
this same postulate also justifies its successive approximation by the increasing 



SEC 3-2 



LIMITS OF SEQUENCES / 85 



sequence {u n } of rational numbers 1, 1-4, 1-41, 1-414, 1-4142, ...,««,.. .. 
In this case an irrational number \/2 is determined as the limit of a sequence 
of rationals. The implications are important, since although irrational 
numbers are of frequent occurrence, in our world in which we live we can 
only undertake practical calculations using rationals ! 

Not all sequences are defined explicitly by giving an expression for the 
general term u n - Often a sequence is defined recursively by giving a formula 
relating the term u n to its predecessor u n -i, and then specifying the value of 
Mi. This is, of course, a difference equation, but in this context it is customary 
to call any rule of this kind a recurrence relation, and one of considerable 
computational importance is 



Un 



=\[ 



Un-1 + 



(Un-l)™' 1 



where m is an integer greater than unity. 

The particular significance of this recurrence relation stems from the fact 
that by using Theorem 3-2 it is not difficult to prove the rather surprising 
result that {«„} always converges to the limit m y'a, irrespective of the choice 
of mi provided only that it is positive. The value of the limit is obvious once 
convergence has been established, for denoting it by L and setting x n -i = x n 
= L, it follows directly from the recurrence relation that L m = a. 

Table 3-1 shows the effectiveness of this method as a computational 
procedure or algorithm for computing \/2 to five figures, using three different 
starting values for mi. To use the relation to compute \/2 we must first set 
m = 2 and a = 2 when it becomes 



Un = r W«-l + 



Un-1. 



Taking as representative the three starting values mi = 1, 1-4, and 5, we 
obtain Table 3-1 in which a dash signifies that no further change occurs in the 
last digit. 



Table 31 



Un 




«i = 1 


ui = 1-4 


«i = 5 


1 


1 


1-4 


5 


2 


1-5 


1-41429 


2-7 


3 


1-41667 


1-41421 


1-72037 


4 


1-41422 


— 


1-44146 


5 


1-41421 


— 


1-41447 


6 


— 


— 


1-41421 



86 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

Obviously convergence is most rapid when the value assumed for m is a 
good approximation to the answer, and much effort may be spared by taking 
a sensible starting approximation. 

3-3 The number e 

Later we shall use an important mathematical constant that is always denoted 
by the symbol e. This number is both irrational and transcendental, and for 
reference purposes its value to ten decimal places is 

e = 2-7182818284. 

There are numerous different ways of defining this constant, but although 
these are interesting, our real concern later in this book will be with the 
mathematical use of the constant e. We shall, for example, see how it is of 
fundamental importance in the study of differential equations and in the 
definition of important mathematical functions like the natural logarithm 
and the hyperbolic functions sinh x, cosh x, and tanh x. 

However, the real purpose of this section will not be to study these 
applications, but to examine one interesting definition of e as the limit of a 
particular sequence. This problem provides both a first encounter with e, 
and also a useful illustration of how approximate information may be ex- 
tracted from the properties of a difficult sequence. We shall prove that if 



lim 



w 



(3-1) 



then 2 < e < 3. The problem of determining e correctly to any given number 
of figures will be deferred until we are better equipped for the task. 
Consider the sequence {u n } with the general term 



u n 



-(■♦i)- 



We will first establish that u n is a strictly increasing sequence, so that 
«n+i > u n , and then show that the sequence {u n } is bounded above by the 
number 3. The postulate of the previous section then establishes that the 
limit e exists and is such that e < 3. Finally, the lower bound 2 will be added 
as a trivial consequence of the proof used to establish the upper bound. 
First let us expand u n by the binomial theorem : 



( i+ i)"- i+B G) + 



=V>G)"--- 




n(n — 1) . . . [n — (n - 


-1)] /1\" 



+ 



Now rewrite this : 



SEC 3-3 THE NUMBER e / 87 

An exactly similar argument applied to u n +i then gives 

~ — '^('-^)^(-.-TlX'-.-Tl) + - 
*.l(-iTT)(-.4l)-(-:-Tl) 

Now all the terms in «» and w„+i are positive and «„+i has one more term than 
u n . In addition, terms in u n+1 that are associated with factorials are larger 
than the corresponding terms in u n because of the obvious inequalities 

K-Tl) >(-fr 



Hence w„+i > u n , showing that {u n } is a strictly increasing sequence. 

To show that {«„} is bounded above we must try to sum the finite series 
for u n and then examine the behaviour of the sum as n increases. As the 
finite series (3-2) stands we can make no progress, but an overestimate of 
this sum can easily be obtained if the terms of the series are simplified. This 
approach will suffice for our purposes, since to prove that the limit e exists, 
we only need to prove that {«„} is strictly increasing and bounded above; a 
strict upper bound is not necessary here. It is only needed when the exact 
value of the limit is to be determined. 

If we use the obvious inequalities 

■>K)>('-;)H)>'- 



it follows at once from Eqn (3-2) that 

2 1 

2! + 3! + ' " " + «! 



« n <l + l+l + I + ... + l ( 3 . 3 ) 



88 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



This is still too difficult to sum explicitly, so using the observation : 



A 1-1 1- 
3! < 22'4! < 2~3' 



' ; «! < 2»-i' 



we further simplify Eqn (3-3) to the form 



«.<l + l+i+I + i + 



+ 



2»-i 



(3-4) 



This can now be summed, since after the first term the remaining terms 
form a geometric progression. We arrive at the result 



u n < 1 + 



1 - (i)* 



1-| 
whence lim u n < 3. 

n— *oo 

The conditions of our postulate are satisfied, so we may conclude that 
{u„} has a finite limit e and, furthermore, that e < 3. Examination of Eqn 
(3-2) shows that u n > 2 for all n so that finally we have established our claim 
that 

2 < e < 3. 

The form of argument used to overestimate series (3-2) is often useful and 
the final inequality (3-4) is usually called a majorizing series. 

Closely related to limit (3-1) is the sequence {v n (x)} with general term 



v n {- 



»-KT 



(3-5) 



To establish the relationship that exists between e and the limit of {v n (x)} 
let us first denote the limit by E(x), so that 



E(x) = lim 



(' + ;)"] 



(3-6) 



Suppose x > to be any rational number and define an increasing sequence 
{«*} of natural numbers by the requirement that the numbers n^x are integral. 
Henceforth we shall set Nt = nijx. Then by restricting n to be a member of 
{m} we may define a sub-sequence {vn k i.x)} of {v n (x)} for which Eqn (3-5) may 
be written in the form 



/ i yv** r/ 



•♦s)'j 



(3-7) 



Using the definition of u n we see that 
vn k (x) = {u N] ) x , 



SEC 3-4 



LIMITS OF FUNCTIONS / 89 



so that taking the limit as rik -*■ <x> we have 
E(x) = lim v„ k (x) 

nic—x, 

= [ lim u Nlc ] x = e* 

N k —x 

Whence the important result 
E(x) = e*. 



(3-8) 

With a more subtle argument it can be established that Eqn (3-8) is 
generally true without the restriction of n to the sequence {«*}. This implies 
that the result is true for all real x. 




Fig. 3-3 Graph of the functions e x and e~ x . 



The function e x is one of the most important functions in mathematics 
and it is called the exponential function. Fig. 3-3 shows its behaviour with x. 
Notice that it is an essentially positive function which is strictly monotonic 
increasing with x. Also shown on the figure is the associated function e _x . 

3-4 Limits of functions— continuity 

The notion of the limit of a function f{x) as x tends towards some value a 



90 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 




Fig. 3-4 Function /(x) with unbroken graph. 

is intuitively obvious in the case of functions whose graph is an unbroken 
curve. A typical function of this kind is illustrated in Fig. 3-4 from which it 
is easily seen that if x is considered to be a moving point, then f(x) will 
approach the value f(a) as x approaches a from either the left or the right. 
In this case/(-x) actually attains the value /(a), and we shall speak of f(a) as 
the 'limit of/(x) as x tends to a' and write 

lim/(x) = f(a). 



Thus, if f(x) = x 3 — 2x 2 + x + 3, then clearly in this case lim/(;c) 
= 5 =/(2). A slightly less obvious example involves finding lim/(x) when 



/(*) = 



Vjc- 1 

X- 1 ' 



since the formal substitution of x = 1 in f(x) seems to yield 0/0 which is 
meaningless as it stands. The difficulty here is easily resolved by cancelling a 
factor (V* — 1) in the numerator and denominator to give 



/(*) = 



1 



Vx+ 1 



from which it is apparent that lim/(jt) = |. 

3— 1 

In effect, the intuitive notion involved in the limit of a function is essen- 
tially the same as that for the limit of a sequence. Namely, we say that L is the 
limit of/0) as x tends to a if, for all x sufficiently close to a,f(x) is close to L. 
In fact, the determination of the value of the limit L involves the behaviour of 
f(x) near to x = a, but does not consider the actual value of f(x) at x = a. 



SEC 3-4 



LIMITS OF FUNCTIONS / 91 




Domain < \x - a\ < 3 Domain < |x - b\ < 8' 



Fig. 3-5 Function /(*) has a smooth graph and attains the limit L at x = a. 

Whether or not/(a) is actually equal to L, as was the case above, is immaterial. 
By only slightly modifying our definition of the limit of a sequence, we arrive 
at the following definition of the limit of a function, which is illustrated in 
Fig. 3-5, and will be used for our subsequent discussion of continuity. 

definition 3-5 The function/(x) will be said to tend to the limit L as x 
tends to a if, and only if, for any arbitrarily small positive number e, there 
exists a small positive number 6 such that 

< |jc — a\ < d => |/0) — L\ < e. 



The significance of the condition < |x — a| < <5 is that the value 
f(a) is specifically excluded from consideration as being irrelevant to the 
determination of the limit. Thus, if 



/(*) = (J 



+ x 2 



for x ^ 1, 
for x = 1, 



then lim/(x) = 2, despite the fact that/(l) = 5. 



Z--1 



If the graph of a function /(^ is not unbroken then more care must be 
exercised when discussing the notion of a limit. The reason can be seen after 
examination of Fig. 3-6 in which the graph has a break at x = c, at which 
point the functional value /(c) has been allocated arbitrarily. This graph 
defines a perfectly satisfactory function, but as x approaches c from either the 
left or the right, so f(x) approaches either the value L- or L+ which are 



92 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 




Fig. 3-6 Function f(x) has broken graph. 



obviously limits in some sense. Furthermore L- ^ L+ and neither is equal to 
f(c). To take account of this, we introduce the concepts of a limit from the 
left and a limit from the right. 

To simplify the explanation we shall write x -* a— in place of 'x tends to 
a from the left' and x -> a+ in place of 'x tends to a from the right'. In terms 
of this notation the function/(x) in Fig. 3-6 has the property that lim = L- 

and lim = L+ which is indicated in the diagram by means of arrows. Once 

x-*c + 

again, in arriving at the limits from the left and right of a point, the functional 
value itself at that point is not involved. It may or may not equal one of 
the two limits so denned. These ideas may be expressed formally as a definition. 

definition 3-6 The function /(x) will be said to have the left-hand limit, 
or limit from the left, L_ as x ->■ a— if, and only if, for any arbitrarily small 
positive number e, there exists a small positive number 6 such that 

< a — x <<5=> |/(;t) - L-\ < s. 

A corresponding definition exists for the right-hand limit, or limit from 
the right, asx-> a+ in which L- is replaced by L+. 

Notice that the function f(x) in Fig. 3-6 only has one-sided limits at 
x = a and x = d and, even though /(x) has a cusp at x = b, and so is not 
smooth there, it nevertheless still has a limit in the ordinary sense at that 
point. This is because of the following obvious result. 



SEC 3-4 LIMITS OF FUNCTIONS / 93 

theorem 3-3 If f(x) has identical left- and right-hand limits at a point 
x = a so that L- = L+ = L, say, then lim/(x) exists and is also equal to L. 

x~+a 

We shall usually resolve simple limit problems of the type just discussed 
either intuitively or, perhaps, by appeal to a graph. However, for complete- 
ness, we now apply the formal definition of a left-hand limit to a specific 
function to show, in principle, how it may be used as an analytical tool in 
less obvious situations. 

For our example we apply the formal definition of a left-hand limit at the 
point x = 1 to the function 



/(*) - {; 



for x < 1, 
forx>l. 



Clearly the left-hand limit at x = 1 is determined only by the behaviour 
of/(x) to the left of that point. The behaviour of/(x) for x > 1 is irrelevant 
to the determination of lim/(x). Obviously, as x —>■ 1 — so x 2 ->■ 1, and thus, 

X— 1- 

intuitively, lim/(x) = 1. 

x-*\- 

If our intuitive argument is correct and this limit is in agreement with our 
definition, we must show that for any e > we can find a positive S, which 
will probably depend on e, such that \x 2 — 1| < e when x— ► 1— and 
< 1 — x < d or, equivalently, 1 — 6 < x < 1 . 

We have \f(x) - L-\ = |x 2 - 1| = |(x - l)(x + 1)| = \x - 1| . \x + 1|, 
but since \x — 1| < d this becomes 

|x 2 - 1| <d\x+ 1|. (A) 

Since x < 1, we overestimate x in (A) if we replace it by the value unity so 
that we have 

|x 2 -l|<2<5. (B) 

Finally, to make this expression less than any small positive number e, 
we need only make 28 < e. This finally proves that lim f(x) = 1. 

Some numbers might help here. Suppose, for example, we wish to find the 
condition that/(x) should be within 0-001 of the left-hand limit at x = 1. 
This amounts to asking that |x 2 — 1 1 < 0-001, which is equivalent to setting 
e = 0-001. Hence, as 6 < \e = 0-0005, our x-inequality 1 — 6 < x < 1 tells 
us that the required condition on/(x) will be satisfied provided 0-9995 < x < 1 . 

In higher mathematics this analytical approach is indispensable but, as 
already remarked, for our purposes a graphical approach to the limit of a 
function must suffice in most cases. An exception is the discussion of indeter- 
minate forms which involve finding the limit of a quotient as x approaches 
some value at which both, numerator and denominator vanish. This will 
be taken up again later as an application of calculus though the reader should 
notice that we have already resolved one such simple problem involving a 



94 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



limit of the form 0/0. 

Although a function such as 



/!<*> = £ 



2+ 1 
3 



for x an integer 
for all other x 



is a perfectly satisfactory function from the mathematical point of view, it is 
not likely to occur in connection with physical problems. We make this 
assertion because in the physical world functional relationships are usually 
smoothly changing in the sense that a small change in the independent 
variable usually produces only a small change in the dependent variable. 
This is not always the case however and, for example, in gas flows involving a 
gas shock wave the gas pressure experiences a sudden jump across a geo- 
metrical surface in space called the shock front. Hence a graph of the gas 
pressure p across a plane shock at x = a, as a function of the distance x 
measured normal to the shock front, could appear as in Fig. 3-7. 



p , —p 2 is pressure 
jump across shock 
front 



Shock front 




Fig. 3-7 Gas pressure p as a function of distance normal to shock front at x = a. 

Nevertheless, despite the existence of common physical situations of this 
type a function as erratic as/i(*) is not likely to be encountered in the real 
world. Aside from points at which a jump occurs, the 'reasonable' functions 
that occur in physics and engineering must be expected to have the smooth- 
ness-of-change property we described earlier. 

This smoothness-of-change property is given the mathematical name 
continuity and plays an important part throughout all mathematical analysis. 
If the reader pauses to think for a moment he will see that the following 
definition describes continuity in terms of the left- and right-hand limits. 

definition 3-7 The function f(x) is said to be continuous at x = xo if: 
(a) lim fix) = lim /(*) = L 

x-+<co — X-+XQ + 

and 



(b)f(x ) = L. 



SEC 3-4 LIMITS OF FUNCTIONS / 95 

In this definition, (a) demands the equality of the left- and right-hand 
limits and (b) ensures that there is no 'gap' in the graph of f(x) at x = xo- 
That is to say that the point (xo,f(xo)) lies on an unbroken curve and so 
coincides with the limits (a). An alternative, but equivalent, definition of 
continuity that is often used replaces (a) by the requirement that lim/(x) 

= L but still retains (b). Either form of definition is equally good but we 
have chosen to emphasize the ideas of left- and right-hand limits since they 
find important applications in engineering and physics. 

Continuity essentially describes a property of a function in the neigh- 
bourhood of a point of interest and not just at the point itself. Accordingly, 
a function will be said to be continuous in the interval (a, b) if it is continuous 
at all points x within (a, b). 

Notice that the effect of condition (b) of our definition on a function such 
as 

-,./*» + 1 for * =£1 

^ ) = ( 6 for*=l 

is to show that/(x) is continuous everywhere except at x = 1. 

Let us paraphrase the notion of continuity. In effect, by requiring that a 
function f(x) be continuous at x = a, we are insisting that if the variation 
of the function about the value L =f(a) does not exceed ±e, where e > 
is arbitrary, then we can find an x-interval of width 26 centred on x = a 
within which this property is always true. This is illustrated by Fig. 3-5, which 
also indicates that in general the number 6 depends on both e and the value 
of x at which fix) is continuous. Thus for the same value of e, the interval 
about x = a is of width 26, whereas the interval about x — b is of width 
23', with 6' # 6. 

If the function f(x) is continuous in a closed interval [xi, X2] and s is 
given, consider the point x = b at which the function changes most rapidly, 
and find the appropriate interval of width 26' centred on x = b in which the 
functional variation from f(b) does not exceed ±e. Because the functional 
variation at x = b was the greatest of any point in [jci, X2], it is obvious that 
if this same interval of length 26' is associated with any other point x' in 
[xi, X2], then the functional variation within that interval will certainly differ 
by less than ±e from the value f(x'). Hence we can assert that for a function 
f(x) which is continuous in a closed interval, when given an e it is possible 
to find a number 6 for the definition of continuity which depends only on e 
and in no way on the value of x at which continuity is being discussed. 
Because of this continuity property which applies uniformly to points 
throughout the closed interval [xi, xz] we speak of such functions as being 
uniformly continuous. This concept proves to be of extreme importance when 
these ideas are pursued further. 

The requirement of continuity in a closed interval cannot be relaxed, for 
then the result is no longer true. For example, the function/(x) = l/x defined 



96 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

in the semi-open interval (0, 2] is continuous, but not uniformly continuous. 
This is because for any given e, the closer we take our point x' to the origin, 
the smaller must we take the value of 6 in order to satisfy \f(x) —/(*') I < e 
for \x — x'\ < d. There is obviously no smallest value of d that will apply to 
the entire interval. 

There are a number of immediate consequences of the definition of a 
limit of a function and of the definition of continuity which we now state as 
two important theorems, 

theorem 3-4 (limits) Suppose that lim/(x) = L and limg(x) = M, then 

X^-XQ X-i-Xq 

(a) lim [f(x) + g(x)] =L + M; 

x->-xo 

(b) lim f(x)g(x)=LM; 

x-+x 

(c) provided M ^ 0, lim [f(x)/g(x)] = LjM. 

x-*xa 

The proof of these results is similar in all respects to the proof of Theorem 
3-1 and since a representative example was presented there we shall not 
repeat the argument again. 

theorem 3-5 (continuity) If/(x) and g(x) are continuous at x = xo, then 
so also are the functions 

(a) /(*)+£(*); 

(b)/(*)sto; 

(c) f(x)lg(x), provided g(x ) ^ 0. 

If, furthermore, f(x) is continuous at x = xq and g(u) is continuous at 
u —f(xo), then the continuous function of a continuous function g[f(x)] 
is continuous at x = xo. 

Once again the proof of this theorem is similar in all respects to the proof 
of Theorem 3-1. However for the curious reader we shall prove result 3-5 (a), 
using the alternative definition of continuity that we mentioned. 

To prove f(x) + g(x) is continuous at x = xq we must establish that 
lim(/(x) + g(x)) = L exists and that/(x ) + g(x Q ) = L. Now as/(x) and 

X— *-XQ 

g(x) are continuous at x = xo by supposition, then lim/(x) = f(x ) and 
lim g(x) = g(xo) and so for any positive e there must exist positive numbers 

di and <?2 such that \x — xo\ < di =>• \f(x) — f(xo)\ < |e and \x — xo\ 
< d 2 => \g(x) - g(xo)| < is. Now, \(f(x) + g(x)) - (/(xo) + g(x ))\ = 
l(/(*) - /(*o)) + (g(x) - g(x ))\ <: \f{x) - /(xo)| + \g(x) - g(x )| and 
\x - xo| < smaller of (d u (5 2 ) => \f(x) -f(x )\ + \g(x) - g(x )\ < ie + ie. 
Thus, given any positive e, we have established that by taking d less than either 



SEC 3 ' 4 LIMITS OF FUNCTIONS / 97 

di or d 2 we ensure that |(/(.y) + g(x)) - (f(.x ) + #(x ))| < e. This formally 
proves our assertion. The proofs of results (b) and (c) are similar. 

Arguments involving continuity usually rely for their success on the 
knowledge that certain familiar functions are continuous. Once a small list 
of such functions has been established it can then be considerably enlarged 
by repeated applications of Theorem 3-5. Accordingly, we present below a 
table of functions, in each case stating the intervals in which they are con- 
tinuous. No proof will be given for most entries since the results are obvious 
from the graphs but for the sake of completeness we shall formally prove the 
first three entries. 

Example 3-5 

(a) Given that C = constant, the function f(x) = C is continuous every- 
where. 

The proof is trivial, since for any x = x ,f(x ) = C showing that the defini- 
tion is always satisfied. 

(b) The function f(x) = x is continuous everywhere. 

The proof is again trivial, but let us indicate how the alternative definition of 
continuity may be used. We must prove that for all x , lim/(x) exists and is 

x—xo 

equal to/(x ). Now it is obvious from the definition of/(x) that/(x ) = x . 
Also, for any x = x and given e > 0, |/(x) -f(x )\ = \x - x | < e 
=> \x — x \ < e so that in this case the quantity 6 = e. The function is thus 
continuous at x = x and, as x was arbitrary, it finally follows that/(x) = x 
is continuous everywhere. 

(c) The function f(x) = x n with n a positive integer is continuous every- 
where. 

We give a proof by induction. Suppose the result is true for some n so that 
x n is continuous at x = x for all x . Now x n+1 = x . x n , and we have just 
proved that x is continuous at x . Hence, using Theorem 3-4 (b), x n+1 is 
continuous. The result is true for n = 1 and so by the principle of induction 
it is true for all n. With a little more care this result can be shown to be true 
for any real positive n and not just for n a natural number. 

The information contained in this table is likely to be useful on many 
occasions and so should be memorized. Its application, together with 
Theorem 3-5, to questions of continuity is usually immediate. Thus, for 
example, the function /(x) = 1/x + sin x is continuous everywhere except at 
the point x = 0, and/(x) = (x™ + a lX ™-i + ■ ■ • + a m )/sin x, with m > 0, 
is continuous everywhere except at the points x = rrn for which n is an integer. 

Finally, in preparation for our use of limits in connection with the tech- 
niques of differentiation, we extend the O-notation to include functions of 



98 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



Table 3-2 Short list of continuous functions 



Fund ion f{x) 


Interval over which f(x) is continuous 


C (constant) 


( — 00, oo) 


X 


(—00, oo) 


x n (// > 0) 


(—00, oo) 


*-" (n > 0) 


(— oo, oo) excluding point x = 


1*1 


(—00, oo) 


x n + w"" 1 + • • • + a n (n > 0) 


(—00, CO) 


x n + aix"' 1 + ■ ■ ■ + a n 
x m + bix m ~ l + ■ ■ ■ + b m 


(-co, co) excluding the zeros of the denominator 


sin x 


(—00, oo) 


COS* 


(—00, oo) 



tan* 



cosec x 
cot* 



(2« - 1) - < x < (2/i + 1) -, integral n 

O - 1) j < x < (2/i + 1) j, integral n 

mr < x < (n + 1)tt, integral n 
mt < x < in + 1)jt, integral n 



smaller order. Henceforth, we shall write 

f(x) = o(g(x)) as x -► xo 
with the meaning that 

«mM_ ft 

The symbol o is read 'little oh' and in words the statement asserts that the 
function /(x) is of smaller order than g(x) asx-> xo- For example, we may 
write (1 + x 2 ) 3 = 1 + 3x 2 + o(x 3 ) as x -* 0, since (1 + x 2 ) 3 - 1 - 3x 2 , 
= 3x 4 + x 6 = o(x 3 ) as x — >- 0. 

3-5 Functions of several variables — limits, continuity 

The related concepts of a limit and the continuity of a function extend without 
difficulty to functions of more than one independent variable, provided only 
that the notion of the proximity of two points is suitably extended. The ideas 
involved here can best be appreciated if we confine attention to functions 
f(x, y) of the two independent variables x and y. 

Let us suppose that/(x, y) has for its domain of definition some region D 
in the (x, j)-plane and that (xo, yo) is some point interior to D. Then, before 
considering f(x,y), we must first make clear what is to be meant by x -*■ xo, 
y->yo in D. 



SEC 3-5 FUNCTIONS OF SEVERAL VARIABLES / 99 



ifr-V+G'-rj'-s^ 



Fig. 3-8 Paths for which the point (x,y) -* (xo, yo). 

An inspection of Fig. 3-8 shows that starting from the points P and Q in 
D, both the full curve and the dotted curve describe possible paths by which 
x and y may tend to x and y . In general, we shall write x -> x , y -yyo, or, 
say that the point (x,y) tends to the point (x ,yo), if />->-0, where 
P = VK* — *o) 2 + (y — yo) 2 ] is the distance between the moving point 
(x,y) and the fixed point (x ,yo). This simple device then allows us to 
interpret a statement about the two variables x and y in terms of a statement 
about the single variable p. By confining attention to a circular region of 
radius d centred on (xo, yo) we may conveniently define a neighbourhood of 
the point (x ,yo). Any rectangle or other simple closed geometrical curve 
containing (x , yo) would, of course, serve equally well to define a neighbour- 
hood of (xo, yo)- When using such a neighbourhood it may or may not be 
necessary to exclude the boundary and the point (x , yo) itself from the defini- 
tion of the neighbourhood. 

Thus, for example, the square x = 0, y = 0, x = 1, and y = 1 defines a 
neighbourhood of the point (J, J). The function 

f(x, y) = \l{xy(x - 1)0 - l)(x ~ i)(y - J)} 
is defined in this neighbourhood, but not at (J, £), on the boundary or on 

x = \,y = i 

Definition 3-8 is now proposed, with this interpretation of x-^-xo, 
y ->• yo firmly in mind. 

definition 3-8 The function /(x, y) will be said to tend to the limit L as 
x -»■ xo and y -+ j> , and we shall write 



100 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

lim f(x,y) =L, 






if, and only if, the limit L is independent of the path followed by the point 

(x, y) as x —>■ xo and y ->■ yo. 

As before, we do not necessarily require that f(x , yo) = L, as the func- 
tional value actually at the limit point (x , yo) is not involved in the limit 
process. If it can be established that the result of the limiting operation depends 
on the path taken then, demonstrably, the function has no limit. The following 
examples make these ideas clear and, on account of their simplicity, are 
offered without proof. 

Example 3-6 

(a) If fix, y) = — "— , then lim = — ; 

!/— 1 

(c) if f(x, y) = -— — f— then lim — - — -f- 



x*+y*+V t-.fr x* + y* + I 8+772 

(d) if f(x, y) = — -, then lim f(x,y) does not exist since 

yi x — 1) a;_»i 

lim fix, y) = 1 if taken along the line y = x, but lim/(x, y) = — 1 

x-*l x -*l 

»— 1 y-*l 

if taken along the line y = 2 — x. 

As might be expected, the concept of continuity of a function fix, y) of 
two variables then follows as a direct extension of the definition of a limit. 

definition 3-9 The function fix, y) will be said to be continuous at the 
point (xo, Jo) if: 

(a) lim/(x, y) = L exists 

I-KtO 
!/— V0 

and 

(b) /(xo, 70) = L. 

We shall say that/(x, y) is continuous in a region if it is continuous at all 



SEC 3-5 



FUNCTIONS OF SEVERAL VARIABLES / 101 



points (x, y) belonging to that region. Notice that condition (a) demands that 
f(x, y) has a unique limit as x -*■ xo and y -»■ yo, and condition (b) then ensures 
that there is no 'hole' in the surface z =f(x,y) at the point (xo,yo). The 
continuity of a function f(x, y) is illustrated in Fig. 3-9 where a circular 
neighbourhood of the point (xo, yo) is shown in relation to the surface. In 
effect, continuity of/(x, y) is simply requiring that a small change in location 
of the point (x, y) will cause only a small change in z = f(x, y). 




Fig. 3-9 Continuity of f(x, y) at (x , y ) and discontinuity at (a, b). 



In Fig. 3-9 the point (a, b) has been deliberately detached from the other- 
wise unbroken surface z =f(x,y), so that the function f(x, y) does not 
satisfy the definition there and hence is not continuous at that single point. In 
general, a function of one or more variables which is not continuous at a 
point will be said to have a discontinuity at that point or, alternatively, to be 
discontinuous there. Thus the function of one variable shown in Fig. 3-6 
has a discontinuity at x = c and the function of two variables shown in Fig. 
3-9 is discontinuous at x = a, y = b. 

These ideas also extend to functions of several real variables in an obvious 
manner once the 'distance' between two points has been defined satisfactorily. 
For functions /(x, j, z) of the three independent variables x, y, z a suitable 
distance function between points (x u y u zi) and (x ,y , z ) is the linear dis- 
tance between them when plotted as points relative to three mutually perpen- 



102 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

dicular Cartesian axes. The distance p is then given by the Pythagoras rule 
as p = {(xi - xo) 2 + (71 - jo) 2 + (zi - z ) 2 } 1/2 . 

The interpretation of distance in the so-called finite dimensional spaces of 
«-dimensions generated by functions of n independent variables is of con- 
siderable importance in mathematics. Essentially, of any function p(P, Q) 
measuring the distance between points P and Q in the space we require that 
for any points P, Q, and R: 

(a) P (P, Q) > 0, 

(b) p(P, Q) = if, and only if, P = Q, 
(c) P (P,Q)=p(Q,P), 
(d) P (P,R)<p(P,Q)+p(Q,R). 

It is easy to check that the two distance functions already defined satisfy the 
above conditions, but this will be left as an exercise for the reader. 

Again the determination of the regions in which any given function is 
continuous will usually be done either on an intuitive or on a graphical basis. 
Thus, in Example 3-6 it is easily seen that: 

2x 

(a) f(x,y) = -— — 2 is continuous everywhere; 

xy + 1 . 

(b) f(x, y) = 2 2 is continuous everywhere except at x = 0, y = 0; 

x ~t" y 

, „ -, s sin xy 

(c) f(x, y) = —— — is continuous everywhere; 

' x 2 + J 2 + 1 J 

(d) f(x,y) = — — is continuous everywhere except at (0, 0) and (1, 1) 

and along x = 1 and y = 0. 

3-6 A useful connecting theorem 

By now it will have become apparent that there is a strong connection 
between theorems concerning limits of sequences and the corresponding 
theorems concerning limits of functions. In fact, with only trivial modification, 
most limit theorems that are true for sequences are also true for functions. 
Naturally this is no coincidence and the reason is explained by this connecting 
theorem. 

theorem 3-6 Let f(x) be a function defined for all x in some interval 
a < x < b. Further, let {x n } be a sequence defined in the same interval which 
converges to a limit a that is not a member of the sequence. Then if, and only 
if, lim/(x„) = L for each such sequence {x n }, it follows that lim/(;c) = L. 

re— 00 x^-a 

The proof of this connecting theorem comprises two distinct parts. First 



SEC 3-6 A USEFUL CONNECTING THEOREM / 103 

it must be established that if \imf(x) = L, then sequences {x n } exist having 

the required property. Second, the converse result must be proved; that if the 
required sequences {x n } exist, then lim/O) = L. Together, these two results 

x—*a 

will ensure that the theorem works in both directions, so that corresponding 
function and sequence limit theorems satisfying the necessary conditions may 
be freely interchanged without further question. 

The first part of the proof is a direct consequence of Definitions 3-3 and 
3-5. It follows from Definition 3-5 that when x is confined to some neighbour- 
hood N a of a, then f(x) is confined to a neighbourhood Nl of L. From 
Definition 3-3, since {x n } has the limit a, there must be some number «o 
such that for n > «o it follows that/(x n ) will also be confined to the same 
neighbourhood Nl of L. 

The second step is a little harder, since it involves an indirect proof by 
contradiction. It involves showing that if we assume that limf(x) ^ L, 

then a sequence {z n } can be found satisfying all the requirements of the 
theorem, for which lim/(z ra ) ^ L. Hence the contradiction showing that 

ft-* 00 

the conclusion lim/(x) ^ L was false. We leave the details of this to any 

interested reader as an exercise. 

To close this chapter, we shall use this theorem together with geometrical 
arguments to establish the three useful limits: 

/sin olQ\ 

s(— H (3 ' 9) 

S(Hr^)- 0; (3 ' 10) 

,. /l-cosa0\ a 2 ,„,„ 

!2(— jH-t <3U) 

These limits are all of the indeterminate variety mentioned earlier and, 
although this topic will receive special mention in a subsequent chapter, it is 
important for the development of our work that they be examined now. We 
shall establish that they are all related to the single limit 



sfrV 1 - 



which we prove first. 

Consider Fig. 310 which represents a circular arc of unit radius with its 
centre at O, inscribed in the right-angled triangle OAB. 

Then it is obvious that 

Area of triangle OAC < Area of sector OAC < Area of triangle OAB. 
Expressed in terms of the angle 6 measured in radians this becomes 



104 / SEQUENCES, LIMITS, AND CONTINUITY 



CH 3 



i sin 6 < h6 < \ tan 0, 
from which we see that 
sin 6 



cos a < 



< 1. 




Fig. 3- 10 Area inequalities. 



(A) 



This result must be true for all acute angles and, in particular, for the 
values of the sequence {0»} denned by 6 n = \jn. Thus (A) takes the form 



„ sin d n , 
cos d n < — t — < 1 

On 



(B) 



and, since lim d n = where the limit is not a member of the sequence, we 

n— *co 

may combine Theorems 3-2 and 3-6 to deduce that 



aft 1 )-- 



(3-12) 



To establish limit (3-9) it is only necessary to replace 6 in Eqn (3-12) by 
7.6, giving rise to 

/sin <x0\ 
lim ( — — = 

or, equivalently, 



1 



,. /sin a.6\ 



The limits (3-10) and (3-11) then follow by using the identity 1 — cos a0 
= 2 sin 2 \<*.Q to form the expressions 



1 — cos 



<*0 „ . , „ /sin ioc0\ 
— =2siniac3 I— ^— \, 



and 



PROBLEMS / 105 
1 — COS ( 



>_a0 _ _/sin£a0\ 2 



e 2 

Applying result (3-9) to these we finally arrive at the required results 

,. /1-COS0\ 

hm ->■ . a = 

e-o \ J 

and 



9 ^o I e 2 / \2/ 2 



The following general result is sometimes useful and, as we shall show 
by example, may be combined with Eqns (3-9) to (3-11) to give a number 
of interesting results. 

Suppose f{x) and g(x) are two functions such that ]imf(x) = a and 

x-*a 

lim£(x) = /?, where a and /S are both finite. Then, clearly, 

limrfz) 

hm [f(x)yW = [Iim/fr)]*-« = «?. 

x—*-a x—*a 

This result, which is true in general, is of course also true when one or 
more of the limits involved is of the form Eqns (3-9) to (3-11). 

Example 3-7 

, , ,. /x 3 + 2x 2 + x + 1\ [1 ~ m ** x ~ DW* - 1) 2 

(a) lim I I 

*-i \ x* + 2x + 3 ) 

/un i- Z 1 - cos 3*V sin 2 *^ 

(b) hm 

*-0 \ X 2 J 

Solution to (a) Here fix) = (pfi + 2x* + x + l)/(x 2 + 2x + 3), so that 
lim f(x) = 5/6 and as g(x) = [1 - cos 20 - l)]/(jf - 1)2, j t follows from 

X-+1 

Eqn (3-11) that limg(x) = 2. Hence, lim [/ftc)]»<*» = (5/6) 2 = 25/36. 



«— i 



SWh?w« to (b) In this case/(x) = (1 - cos 3x)/;c 2 and gix) = (sin 2x)/x. 
A direct application of Eqns (3-9) and (3-11) then shows that lim/(;c) = 9/2 

and lim^O) = 2 and thus lim [f(x)y (x) = (9/2) 2 = 81/4. *~* 

*— x->-0 

PROBLEMS 

Section 31 

3-1 Give an example of a numerical sequence and of a non-numerical sequence. 



106 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 



3-2 Use the terms bounded, unbounded, strictly monotonic increasing, and 
strictly monotonic decreasing to classify the sequences {u n } which have the 
following general terms: 



(a) ii. = (-«)«+!; (b) u n 



-(-?' 



(c) u„ = sin (1/n); (d) u n = 2 + (-1)"; 

, s n + 1 . 2n + 3 

(e) u n = = — —z> (f) « n = — — 7- 

In + 3 « + 1 

3-3 Give an example of each of the following types of sequence: 

(a) bounded; (b) strictly monotonic decreasing; (c) monotonic decreasing; 

(d) strictly monotonic increasing; (e) bounded above; (f) bounded below. 

3-4 Use an ordinary graph to plot the first ten terms of the sequence {u n } for which 
k» = (-J)»(» + 2)/«. 

3*5 Using the device described in connection with Fig. 3-1 (b) to compress the 
horizontal axis, plot the first five terms of the sequences {u n } which have the 
general terms: 



(a) u n = (■ 



•«" (m-- 



n J 

(b) Un = 1 + 2 ~y 
r = l r- 

Section 3-2 

3-6 Find a neighbourhood {a, b) of the sequence {1 + (— l) n /«} such that 

(a) there are 100 terms outside it; 

(b) thefe are 10,000 terms outside it. 

Deduce that there are infinitely many terms inside any such neighbourhood. 

3-7 Find a neighbourhood (a, b) of the sequence {(2n + l)/«} such that 

(a) there are 10 terms outside it; 

(b) there are 1,000 terms outside it. 

3-8 Name the limit points of the sequence {u n } which has the general term u n 
= sin [(« + l)/2]w. Identify the sub-sequences that determine these limit 
points. 

3-9 Name the limit points of the sequence {u n } with the general term u n = 
sin [(n 2 + n + 1)/2/i]tt. Identify the sub-sequences that converge to these 
limit points. 

3-10 Give examples of sequences having (a) no limit point, (b) one limit point, 

(c) two limit points. 

3-11 Name the limit points of the sequence {u n } which has the general term 
1 ~ 32S for " even 

Un= { 

——7 for n odd. 



PROBLEMS / 107 



State whether or not the limit points belong to the sequence. 
3-12 Determine the following limits: 

(a) hm ; 

n— oo n 3 

n* i im (2» 8 + n - !)(« + 2) . 
{h) l™ (3»» + 7«+ll) ' 

(c) lim ;; — TTTn' 

... ,. n + (-2)" 
(d) hm ; — — -; 

.. ,. /l 2 + 2 2 + 3 2 +- • - + tfl\ 
(e) ^ ( W J" 

3-13 Give an expression for the «th term of the sequence y/2, V@V2)> 
V[2V(2 V2)], .... Use your result to deduce the limit of the sequence. 

3-14 Determine the limits : 

(a) lim (V(« + a) — V"). where a > is any real number; 

n— *-oo 

„ „ ,. nil sin n — 3 cos 2n) 

(b) lim ; 

„—«, n 2 + 2/i + 1 

(3«+2 -(. 5«+2\ 

(d) lim »-v/(l + "")(a ^ °)- 

n— > oo 

3-15 Use the O notation to express the behaviour of the following expressions for 
large x: 

(a) 2x 2 + x + sin (1/x); (b) 3 + -; 

, N 3x 3 + 2x + 1 ,., * 3 sin x + 1 

(C) x 2 +l ; (d) x3 + 3 ; 

x 2 

(C) V(* 3 +X+1) 

3-16 Suppose that the sequences {u n }, {v n }, and {w n } are such that u n < w n < v n 
for all n greater than some fixed number no, and that {u n } converges to the limit 
L and {v n } converges to the limit M. Show by example that the sequence {w n } 
need not converge to a limit. 

317 Outline the details of the proof of Theorem 3-2. (Hint: Consider the limits of 
the sequences {u n — w n } and {w n — v n }-) 

3-18 Give two different proofs of the convergence of the sequence {u n } in which 

11 11 

u n =l+~ 3 + ^ + • • - + 3^1+ —^> appealing first to Theorem 3-1 (a) 

and then to Theorem 3-2. 



108 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

3-19 Use Theorem 3-2 to prove the convergence of the sequence {u n } in which 

1 / 1\ w 2 / 2\ -n 3 . / 3\ w 
«„ = - 2 sm^l+-j-+- 2 s 1 n^H--j- + -s.n^l+-j- + --- 



n — 1 . / , n — 1\ w 

sin 1+ -. 

n \ n J 2 



3-20 Let {u n } be an increasing sequence bounded above by m. Let this bound m, 
together with the members u n of the sequence, be represented by points on a 
line. Then, either the mid-point mi = i(ui + m) of the line segment between 
hi and m is an upper bound of {««}, or it is not. According as rm is, or is not, 
an upper bound of {««}, take for the next point rm the mid-point of the half 
line segment to the left or right of mi, respectively. Next, according as mz is 
or is not, an upper bound of {u n }, take for the next point ma the mid-point of 
the quarter line segment to the left or right of ni2, respectively. Repeat this 
process indefinitely to generate an infinite sequence of points {m r } as indicated 
in the diagram. 



:,l*Si 
m 



Limit L of {u n } 

Give reasons why 

(a) {m r } has a single limit point L ; 

(b) the fact that {u n } is an increasing sequence implies that lim u n = L. 

m-coo 

3-21 Let u n = i(u n -i + (o/«n-i)) and v„ = («„ — V«)/(«n + Va), where hi and 
a are any positive numbers. By showing that v„ = v n -i 2 = v n -2 4 = v n -3 8 
= ■ ■ ■ = vi*"'", deduce the result < v n < | fi |". Then, using 

Theorem 3-2, prove that lim v„ = thereby establishing that lim u„ = \/a. 

n— *co ?i-*co 

3-22 Using the algorithm u„ = Mu n -i -\ compute to four figures the first 

five terms in the sequence {««} corresponding to the starting values (a) u\ = 1, 
(b) «i = 2. Compare your results with the limiting value V3. 

3-23 Using the algorithm u n = Mu n -i H A compute to four figures the first 

five terms in the sequence {u n } corresponding to the starting values (a) wi = 1, 
(b) «i = 2. Compare your results with the limiting value 3 V5. 

Section 3-3 

The following two related problems show how the approximate behaviour of e* in 
the interval — 2 < x < 2 may be inferred directly from the sequence {v n (x)}. 

3-24 Define v n (x) by the expression 
v n (x) = (l + A"- 



PROBLEMS / 109 



Use essentially the same arguments as those leading to Eqn (3-4) to prove 
that {v n (x)} is a strictly increasing sequence for any fixed positive x and then 
show that 

x 2 x 3 x n 

Vn{x) <l + x + T + - + --- + — . 

By summing this expression and taking the limit as n -*■ co deduce that 

2 + x 

1 < e* < for < x < 2. 

2 — x 

Compare this result with Fig. 3-3. 

3-25 Using the same definition of v n (x) as above, form the sub-sequences {v2m(x)} 
of even terms and {t)2m+i(x)} of odd terms. Modify slightly the arguments used 
in the previous example to prove that both sub-sequences are strictly mono- 
tonic decreasing for negative x. Show that {V2, n +i(x) — i>2m(x)} is a null 
sequence and hence deduce that both the even and odd sequences tend to the 
same limit. Modify v* m (x) to establish that 

x 2 x 3 x 2m 



V 2m (x) >\~ X+ ---+■■ ■+ ^n 



By summing this expression and taking the limit as ;/ -»• co deduce that 

2 — x 

< e* < 1 for < x < 2. 

2 + x ~ ~ 

Compare this result with Fig. 3-3. 

Section 3-4 

3-26 Determine the following limits of functions: 

(a) lim x 3 - x 2 + x + 1 ; (b) lim * t *"!" S 

z~a *~*3 x 3 — 1 

/ ^ r V(x2 ~ 6) mm- x 3 + x 2 ~x-2 

(c) hm — ; (d) lim 



.3 x 2 + 1 ' v ^__ 2 (x + \)(x + 2) 

(e) lim - + A)3 ~ * -; (f) lim {\/(x 2 + 1000) - VO 2 - 1000)}; 



(g) lim x[V(.x 2 + 3) - jc]. 

Determine these limits whei 

(x 3 + x - 1 f( 
(a) \imf(x) where /(x) = ( 

«-i 1 1 + sin (.v — 



:r-*cc 

3 r 27 Determine these limits when they exist : 

- 1 for x < 1 

1) for x > 1 ; 
x - 1 



(b) lim 



f. 
(c) lim/(x) where f(x) = ( 



r 

x 2 + sin i « for x < 3 
4 + x 2 for x > 3 ; 



(d) lim |x 2 -l |; (e) lim ? + C ° S * ■ 

x-~i x-~\* 1 - sm x 



110 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

3-28 Determine the left- and right-hand limits of these functions at the stated 
points: 

3^+1 _j. 5^+1 

(a) lim — — ; 

*-2± 3* + 5* 

,w> r /•/■-> u rr ^ f 1 + 2 sin* for .t <\* 

(b) hm f(x) where /(a-) =j 

x—u± |cosec x for x > \-n\ 

(c) lim I x 2 + x - 1 I ; 

x— 2± 

-2 for x < 



(d) lim /"(*) where /'(jf) = 
,^o± 7W yv U+-|x|forjc>0; 

(e) lim -^— • 

z->-3± J — •* 

3-29 Determine the domains of definition for which these functions are continuous : 
(a) /(*) = x + M ; (b) f{x) = l/(x 2 - 1) ; 

^ « ^ * 5 + x 2 - 1 x s + 4x 2 + x _ 6 

(C) * x) = 4 + sinx-2cosx' (d) ^ W = (x - l)(x + 4) ; 

!2x + sin x for x # « 77/2 
« 2 + 1 . ,. 

3-30 Give examples of functions of the following type: 

(a) continuous everywhere except at x = 1 and x = 2; 

(b) discontinuous at the points x = rnr with n an integer; 

(c) continuous everywhere but neither purely algebraic nor purely trigo- 
nometric; 

(d) continuous everywhere except at x = 1, where the left-hand limit is —1 
and the right-hand limit is 3 ; 

(e) continuous everywhere except at x = 1, where the left-hand and right- 
hand limits both equal 2. 

3-31 Suppose it is known that a function /(x) is continuous over the interval 
xo < x < x 2 , and that f(x ) = yo, f(xi) = y x and /(x 2 ) = yi. Explain why 
it is reasonable to assume that when the functional values yo, yi, and yz are 
reasonably close together, f(x) may in some sense be represented by the 
expression 

f( x ) ^ - *i)(* - X2) (x - x )(x - x 2 ) 

~ (xo — xi)(xo — X2) (Xl — X ){X\ — Xl) ' 

(x — Xo)(x — Xl) 
(X2 — Xo)(X2 — Xl)' 2 ' 

Any formula such as this, from which the behaviour of a function over an 
interval is inferred from its behaviour at specific points in that interval, is 
called an interpolation formula. This particular one is called the three point 
Lagrangian interpolation formula and we shall see later that it gives exact 
results when applied to any linear or quadratic function f(x). Considering 
y = sin x for < x < 3tt, explain how this formula might give misleading 
results. 



PROBLEMS / 111 



3-32 Apply the expression given in Problem 3-31 to the function y = sin x, taking 
as the points xo, x\, and xi the respective radian arguments 0-6, 0-9, and 1-2 
and so find the appropriate three point Lagrangian interpolation formula over 
the interval 0-6 < x < 1-2. Use your result to deduce approximate values for 
sin 0-8 and sin 11 and compare these with the exact tabulated values. 

3-33 Repeat the previous problem, but this time take xo = 0-4, xi = 1-2, and 
X2 = 1-7 and deduce approximate values for sin 0-9 and sin 1-5. Compare 
your results with the exact tabulated values. 

3-34 Consider the continuous function f(x) defined on the interval [0, 2] by the 
rule f(.x) — x for < x < 1 and f(x) = 2 — x for 1 < x < 2. Taking 
xo = 0-2, xi = 0-8, X2 = 1-3, apply the expression given in Problem 3-31 in 
order to find an interpolation formula over the interval [0-2, 1-3]. Compare 
the approximate and exact values at x = 0-5, 0-7, and 10. 

3-35 The density of thematerial of a rod of length L is a function /'(x) of the distance 
x measured from one end. Describe in physical terms, rods that are char- 
acterized by the following functions f(x) : 
(a) /(x) = constant for < x < L ; 

I pi for < x < §L 

(b) /(*)=" 



(c) fix) = P (l + kx) < x < L. 

3-36 If the function f(x) has the same meaning as above, specify the functional 
forms it must take in order that it describes : 

(a) a rod of length L having constant density pi over half its length and a 
density that changes steadily (that is, linearly) with distance from pi to 
P2 over the remaining half of the rod; 

(b) a rod of length L comprising three sections of equal length with constant 
densities pi, P2, and P3 in each section ; 

(c) a rod of length L having a density that increases quadratically with x 
(that is, like the square of x) from pi at x = to P2 at x = L. 

Section 3-5 

3-37 Let/(x, y) denote the density of the material at the point (x, y) of a thin flat 
plate in the (x, j)-plane. Give the functional forms of/(x, y) in order that it 
should describe: 

(a) a circular plate of radius R centred at the origin, with the material to the 
left of the j-axis having a density pi and the material to the right a density 

Pi', 

(b) a circular disc of inner radius R and outer radius 3R in which the density 
is constant and equal to p out to a circle of radius 2R, after which it 
decreases linearly to the value Jp at the outer edge of the disc; 

(c) an isosceles triangle with its apex at the origin and sides of length L 
lying to the right of the y-axis and inclined at angles \-* and —\t, respec- 
tively, to the x-axis, with the material above the x-axis having a density 
Pi and the material below the x-axis having a density P2. 

3-38 Let point P have the Cartesian coordinates (1, 1), and let N\ denote the unjt 
circle drawn with P as its centre. Define N r to be a circle, concentric with Ni, 
and let us agree to write N r +i <= JV r if the circle AV+i is contained within the 
circle N r . Then N r +i <= N r , for all r, describes a family of neighbourhoods of 



112 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

the point P. Give examples of families {N r } of neighbourhoods of P that : 

(a) have the property that lim (radius of Nr) ->- J; 

r— ►co 

(b) have the property that lim (radius of N r ) ->- 0; 

r— *-co 

(c) have the property that; area N r +i = J area N r and lim (area of N r ) -> \. 

r->co 

3-39 State the largest neighbourhood about the stated points P in which the 
following functions are defined. Also state if they are defined at P and on the 
boundary of the neighbourhood : 

(a) f(x,y) = \l{xy(lx - l)(y + 2)(x + l)(y - 2)} taking point P as (- 1, 2); 

(b) /(x, v) = _ a _ 2 taking point P as (0, 0); 

1 + x 2 + y 
3-40 Determine these limits when they exist : 



(c) f(x,y) = 1 ^ taking point P as (2, 3). 



/ ^ ,. ^ 2 y 2x 2 + xv + 1 

(a) ^ 2, 2 + 2v 2 +i ; (b) ," m K , 2 + 2,; + , 2 ; 

!/— 2 J/--.2 



(c) ,im fr-')™* . (d) lim 

:r-2 X i — 4 j^o 

W— 1 V— Iff 



1+2 cos xv + sin xy 

2 + xy 



3-41 Give examples of functions f{x,y) having these properties: 
(a) Km f{x, y) = 2; (b) Mm. fix, y) = 0; 

W—3 )/-> Jtt 

(c) lim fix, y) does not exist. 

V—-3 

3-42 Find the points or lines of discontinuity of these functions: 
(0 for x 2 + y 2 = 1 

(a) fix, v) = x sin xv 

[ t _ X 2 _ y2 elsewhere; 

(3 for x — 1 , y = 2 
1+2 , 2+/ elsewhere ; 

■ y — 1 

,,, ,, , x 2 sin v + y 2 sin x + 2 

(d)/(*.,y)= 3c4 / 2jcV+J>4+1 - 

3-43 Let P and Q be any two points in the (x, j)-plane. Prove that if the distance 
function p(P, Q) is taken to be the length of the straight line joining P to Q 
then: 

(a) P (P, Q) > 0; 

(b) p(P, Q) = if, and only if, P. = Q; 



PROBLEMS / 113 



(c) p(p,Q) = p(Q, P); 

(d) p(P, R) < p(P, Q) + p(Q, R), where R is another point distinct from 
P and Q. 

3-44 Repeat the proof of the previous problem, but this time let P, Q, and R be 
points in space. 

Section 3-6 

3-45 Apply the results of Section 3-6 to determine these limits: 
x r , „ ,. 1 — x'2 cos x 



(a) lim — -; (b) lim , r _ , A 



*-o V(l - cos x)' ,^.„ y'[2 sin (x - i^)]' 

sin (x + h) - sin x 2 sin 3 (x/4) 

(c) hm ; (d) lim ; 

A-m h , r ^o x 3 

, . ,. I sin x | 
(e) lim 



x—o± x 

3-46 Apply the results of Section 3-6 to determine these limits: 

^ r i 2 _l u , ^ / cos(x + h) - cosx \ 

(a) lim (x 2 + hx + 1) ; 

ii^o \ n ] 

„ . ,. sin x — sin a , . ,. / sin « 

(b) hm ; (c) lim — 



x \ sin xjta' 



(1 \ /x 2 •— x + 4\( si " -■ f '/- r 
a: sin- ; (e) lim — — - ; 

x) .r^o\x 2 - x + 1/ 



■v_2\ t si " 3(-c - 2)]/(.r - 2) 

(f) lim ' X 



,_ 2 \x* - 4) 
3-47 If h{x) is a function for which lim h(x) = 0, use Theorem 36 to justify writing 

x— *-a 

lim (1 + //(x)) 1 '"'^ = e. 

x-*a * 

3-48 Let functions /(x) and^-(x) be such that \\mf(x) = 1 and lim g(x) — ► x, so 
that we may write /(x) = 1 + //(x) where \imh(x) = 0. Then, considering 

x-*a 

the function [/(x)] ff<a:> , use the result of Problem 3-47 to show that 

\\mh(x)y{x) 



(x) — p.x~*a 



lim [f(x)]o 



3-49 Use the result of Problem 3-48 to determine these limits: 
(a) lim (l--Y; (b) lim (^JV; 

x^oz \ Xj x ^„ \X + I J 

(c) lim ( — ^-r) ; (d) lim (1 + sin 2x) 1/x . 

x^cc \X + 1/ x— 



114 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 

3-50 Determine the following limits which do not necessarily require the result of 
Problem 3-48: 

(a) lim (l + -)l X ; (b) Hm(l + ^ ) ; 

I i\(4* + 3)/(x + 2) 

(c) lim - ; (d) lim (cos x) : 

*-«, \X 2 J x^O 



(e) lim (cos x) 1 /"; (0 lim 



x^O ' r—0 \ 4 * 



,2/**. 



Complex numbers and 
vectors 



4-1 I ntroductory ideas 

A number of important properties of the real number system have already 
been considered, and we shall now examine to what extent quantities repre- 
sentable as displacements in space may be incorporated into a number 
system. The name vector quantity is reserved for all quantities that are 
representable as a displacement in space or, more exactly, as a directed line 
element. Familiar vector quantities are force, magnetic field and velocity, 
which are all representable by a line whose length is proportional to their 
magnitude and whose direction is parallel to the direction of the original 
quantity. In addition, the line of action of a vector has a sense associated 
with it, which means that we must specify a direction along the line to 
indicate the way in which the vector acts. 

Thus to represent a velocity of 3 ft/s in an easterly direction we would 
first adopt a convenient length scale, say 1 in to represent 1 ft/s and then, 
after marking the points of the compass on our paper, we would draw a line 
3 in long in an east-west direction. Finally we would add an arrow to the 
line pointing eastwards to indicate the sense of the velocity. This line could 
be located anywhere on our paper since it does not represent a velocity that 
is associated with any particular point. Reversal of the arrow would corres- 
pond to a reversal of the direction of the velocity, so that the line would then 
represent a velocity of 3 ft/s in a westerly direction. 

Not all quantities are vectors, and another important group are called 
scalars. The word scalar describes any quantity that has magnitude but no 
direction. Typical scalar quantities which have units are temperature, mass 
and pressure. The real numbers are themselves scalars, and are used to describe 
the numerical magnitudes of both scalar and vector quantities, irrespective 
of whether units may be involved. The terms scalar and vector describe 
collectively two important groups of quantities in the real world. It should, 
however, be added that they do not jointly give a complete description of 
all possible physical quantities. Others exist that are neither scalar nor vector, 
though this need not be elaborated here. 

In giving meaning to the square root operation when applied to negative 
numbers, we shall see that a special kind of two-dimensional vector arises. 
Its value in mathematics has proved to be so great that although such vectors 
are restricted to describing vector quantities in a plane, they have been given 
a special name, complex numbers. Because of this restriction, in addition to 



116 / COMPLEX NUMBERS AND VECTORS CH 4 

studying complex numbers, we shall need a more general theory of vectors so 
that we can describe the cited examples of vector quantities, and any others 
that may arise, in all possible situations and not just in a plane. 

Despite this limitation of complex numbers, their vector properties are 
still important enough in special situations for them to be in this chapter. 
Their value elsewhere in mathematics however is even greater, and makes 
them a discipline in their own right. The main reason for this is to be found 
in their relationship to real numbers and in the consequences of their intro- 
duction into functional relationships in the roles of independent and dependent 
variables. This latter aspect will be pursued later when we discuss another 
valuable geometrical idea, a conformal transformation. In the meantime we 
shall develop the vector properties and algebra of complex numbers to the 
point of general usefulness in mathematics, postponing until the end of this 
chapter the alternative approach that is necessary for study of general 
three-dimensional vector quantities. As already mentioned, each is valuable as 
a separate discipline, though, as would be expected, each has a separate 
notation and, generally, a quite different field of application. 

The following introduction to complex numbers is based only on a 
knowledge of elementary trigonometric identities, and not until after more 
study of the exponential and trigonometric functions will we unify our 
treatment of these two topics. 

The origin of complex numbers was the desire of eighteenth-century 
mathematicians always to be able to compute the roots of polynomials, 
even when they are of the form 

x 2 =-\. (4.1) 

It was Leonhard Euler (1707-83) who first recognized that the real number 
system was deficient in respect of admitting solutions to all possible poly- 
nomials and, in connection with Eqn (4-1), he proposed that a new number i 
be introduced to extend the number system. In keeping with the mathematical 
beliefs of that period, he called i the unit imaginary number and related it to 
real numbers by requiring that 

J' 2 =-1. (4-2) 

If we allow the use of this new symbol, then / = y / — 1 is the positive 
square root of minus one, whence Eqn (4-1) may be seen to have the two 
roots x = i and x = —i. That x — i is a root follows from the definition of /', 
whilst x = — / is also a root since (— i) 2 = (— l) 2 . i 2 = 1 . i 2 = — 1. With 
the introduction of /, equations such as 

x 2 = -k, 

which are slightly more general than Eqn (41), can also be solved. The 
equation may be re-expressed in the form x 2 = k . (— 1), showing that its 
roots are x = i\/k and x = —iy/k, where the positive square root is always 



SEC 4-1 INTRODUCTORY IDEAS / 117 

taken. For example, if x 2 = —9, then the roots are x = 3/ and x — —3/'. 

The success of Euler's idea lies in the fact that only this one new number 
need be introduced to enable solutions to be found to all polynomials, irre- 
spective of their degree. As a first step towards seeing this, consider the 
quadratic equation 

ax 2 + bx + c = 0, (4-3) 

and suppose that b 2 — 4ac < 0. Then, setting 4ac — b 2 = m 2 , and formally 
applying the usual formula for the roots of a quadratic, we obtain 



-b ± V-w 2 

or 



Hal) *'(£)• 



la 
Hence, denoting the two roots by x\ and x%, they take the form 

*-(i?) + / (s) and *-(ir) -'(£)■ (4 - 4 > 

The numbers xi and xi are not ordinary numbers since each comprises the 
sum of a real number and a multiple of the unit imaginary number /. On this 
basis it is reasonable to conjecture that each root of any arbitrary polynomial 
will be of the same form and, should the multiplier of i be zero, that root 
will reduce to a real number. 

This conjecture is correct, but before we may verify it, we must see how to 
perform arithmetic on numbers of this special type. These are the complex 
numbers already mentioned and, henceforth, we shall always refer to them 
by this name. Unless the exact form of a complex number is needed, it is 
useful to denote it by a single symbol, usually z, so that an arbitrary complex 
number z is of the form 

z = x + iy, (4-5) 

where x and y are real numbers. We call Eqn (4-5) the real-imaginary form 
of a complex number, and refer to x as the real part of z, and to y as the 
imaginary part of z. In symbolic form we write 

x = Re z, y = Im z. (4-6) 

Hence if z = 4 — li, then Re z = 4 and Im z = —7. We stress that Re z 
and Im z are real numbers. The zero complex number is denoted by and 
represents the number z = + i . 0. 

Already, and without proper justification, we have attributed some 
reasonable arithmetic properties to i. We have, for example, assumed results 
such as xi = ix for all real <x, and \/—x = \/—\ . *Jx = iy/x. To proceed 
logically and rigorously it would be necessary to define addition, subtraction, 
multiplication, and division for complex numbers and then to examine the 
applicability of the real number axioms of Chapter 1 in the case of complex 
numbers. This is necessary since whatever the arithmetic laws we now propose 



118 / COMPLEX NUMBERS AND VECTORS CH 4 

for complex numbers, they must obviously be in agreement with the real 
number axioms of Chapter 1, whenever the imaginary parts of complex 
numbers are zero. We shall not in fact justify the complex number axioms we 
now formulate, since this is a straightforward matter and provides good 
exercise for the student (see the problems at the end of the chapter). Instead, 
we simply summarize the results, pausing only to discuss in detail the most 
basic operations necessary for the manipulation of complex numbers. 

4-2 Basic algebraic rules for complex numbers 

First we shall agree to denote addition and subtraction of the complex 
numbers z\ and zi in the usual manner by writing z\ + zz and z\ — z%, 
respectively. Multiplication of the complex numbers z\ and zz will be denoted 
by juxtaposition thus, Z1Z2. Before going on, and in order to work with 
equations, we must define the meaning of equality between two complex 
numbers, and then we can define the operations of addition, subtraction, and 
multiplication. The following definitions are all phrased in terms of the 
arbitrary complex numbers z\ = a + ib and zi = c + id. 

definition 4-1 We shall say that the two complex numbers z\ and z% are 
equal, and will write z\ = z% if, and only if, a = c and b = d. That is if, 
and only if, their real parts and their imaginary parts are separately equal. 



Example 4-1 Of the complex numbers z\, z%, and zz defined by z\ = 3 — 1 
zi = 1 + 3i, and zz = 3 — 2/, it is obvious that z\ = zz but that z\ # . 



= 3 - 2/, 

: Z2 

and zz =£ zz. 



definition 4-2 By the sum z\ + zz will be understood the single complex 
number which written in real-imaginary form has a real part that is the sum 
of the real parts of z\ and zi, and an imaginary part that is the sum of the 
imaginary parts of z\ and zz. Thus for the stated numbers z\ and Z2 we have 

zi + z 2 = (a + c) + i(b + d). 

Example 4-2 If z\ = 2 + i and Z2 = 1 — 3/', then z\ + z% = 3 — 2/'. 

definition 4-3 By the difference z\ — zz will be understood the single 
complex number which written in real-imaginary form has a real part that 
is the difference of the real parts of zi and z% and an imaginary part that is the 
difference between the imaginary parts of z\ and z%. Thus for the stated 
numbers z\ and zi we have 

z\ — z% = (a — c) + i(b — d). 
Example 4-3 If z\ = 5 + 6< and z% = 4 — 2/, then z\ — z 2 = 1 4- 8/. 



SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 119 

Using these definitions it is easily verified that axioms A-l to A-5 of 
Chapter 1 also apply to complex numbers. To proceed to an examination of 
the other axioms we must define the operation of multiplication. 

definition 4-4 The product z\z%, in which z\ = a + ib and z% = c + id, 
is a single complex number which may be written in real-imaginary form. 
The product is carried out algebraically as would be the ordinary product 
(a + P){y + d), and the final result is obtained by making the identifications 
a = a, jS = ib, y = c, d = id and using the result i 2 = — 1 to combine the 
four terms that result into a real part and an imaginary part. Thus we have 

zxzi = (a + ib)(c + id) = ac + iad + ibc + i 2 bd = {ac — bd) + i(ad + be). 

Example 4-4 If z\ = 2 + 3/ and Z2 = 1 — /, then z\z% = 5 + /. As a more 
difficult example let us express (1 + j) 4 + (1 — i) 4 m real-imaginary form. 

Now (1 + j) 4 = (1 + 4/ + 6/ 2 + 4/ 3 + j 4 ) and (1 - /) 4 = (1 - 4/ + 6i 2 
— 4j 3 + j 4 ), but as i 2 = —1, i 3 = —i, and f 4 = 1, these expressions become 
(1 + /)4 = _4 and (1 - /)« = -4. Hence (1 + i) 4 + (1 - 4 = -8. 

The definitions of addition, subtraction, and multiplication of complex 
numbers are used in the obvious manner for the solution of simple equations. 
Thus, if 2z — (2 + = 4 — 3i, then adding (2 + to both sides of the 
equation gives 2z + = (4 — 3r) + (2 + i) or 2z = 6 — 2/ whence z = 3 — i. 

In all cases, the reader should memorize the method employed in the 
definitions, and not the quoted formulae. 

With this definition of multiplication it is a simple matter to verify that 
axioms M-l to M-4 and also axiom Dl apply to complex numbers. When 
one of the numbers z\ or z% reduces to a real number, then the real and 
imaginary parts of the other are both scaled by the same factor. If the scale 
factor is — 1 the sign of the complex number is reversed. To discuss axiom 
M-5 and division we need to proceed more carefully. 

As it stands, an expression such as (a + ib)jc is well defined as a complex 
number, for we may regard (1/c) as a multiplier of (a + ib) and, provided 
c # 0, Definition 4-4 will give the result. In this case a and b are both scaled 
by the factor (1/c). However, it is not clear that the more general expression 

z\ a + ib 
z 3 = - = — —, (4-7) 

z<l c + id 

is reducible to a complex number expressible in real-imaginary form. The 
key to this problem is to be found in M-5 itself when we recall that division 
is really defined as the operation inverse to multiplication. Hence, we must 
rewrite Eqn (4-7) in the equivalent form 

z 3 (c + id) = a + ib, (4-8) 



120 / COMPLEX NUMBERS AND VECTORS CH 4 

and then try to determine zz. Now it is easily verified that any complex 
number a + //? when multiplied by the associated complex number a — //? 
gives the real number a 2 + /? 2 . Hence, if both sides of Eqn (4-8) are multiplied 
by (c — id), the multiplier of zz will simply become the real number c 2 + d 2 . 
Carrying out this operation, Eqn (4-8) takes the form 

z 3 (c 2 + d 2 ) = (a + ib)(c - id) (4-9) 

whence, dividing by the real number (c 2 + d 2 ), we find that 

(ac + bd) + i(bc - ad) 
c 2 + d 



z * = :h-5i • < 4 - 10 > 



Equation (4-10) is now in the real-imaginary form of a complex number and 
is the result of the quotient (4-7). Many books take expression (4-10) as the 
formal definition of the quotient (4-7). The definition we shall propose shortly 
is equivalent to Eqn (4-10) in all respects, but its form is much easier to 
memorize. The simplification is achieved by the introduction of a new and 
useful operation called forming the complex conjugate of a complex number. 

definition 4-5 If z = a + ib is an arbitrary complex number, then the 
complex number z = a — ib is the complex conjugate of z. The symbol z 
is read 'z bar'. Equivalently, we may state that the complex conjugate of a 
number is always obtained by changing the sign of the imaginary part of 
that number. 

With this definition in mind it is easy to show that the following definition 
of the quotient z\\z% is equivalent to Eqn (4-10). 

definition 4-6 (division) The quotient z\\z<i of the two complex numbers 
z\ and Z2 is the complex number (ziz 2 )/(z2Z2). 

Using this definition it is a straightforward matter to verify axiom M-5 for 
complex numbers, provided only that Z2 ^ 0. 

Example 4-5 We illustrate division by setting z\ = 2 + / and z 2 = 3 — 2/. 
Now z 2 = 3 + 2i and zi/z 2 = (ziz 2 )/(z 2 z 2 ) = (2 + 0(3 + 2/)/(3 - 2/)(3 + 2i), 
whence Z1/Z2 = (4 + 7/)/ 13. By this same method, an equation of the form 
2z(2 + i) = 1 + i is seen to have the solution z = (1 + i)/(4 + 2i) 
= (3 + 0/10. 

On account of the fact that z is an ordinary complex number, its general 
properties are exactly the same as those of any other complex number. Hence 
the number axioms that apply to z, apply equally well to z. The following 
specially useful results are easily proved, and are related to the arbitrary 
complex number z = x + iy, to its complex conjugate z = x — iy and to 
the real number \z\ associated with z and defined to be \z\ = (x 2 + y 2 )*. 
(See Definition 4-7.) 



SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 121 





z 


fz 


= 2 Re z = 2x; 




z — z = 2i Im z = 2iy; 




* = W, 




l —F)- 




z \z}' 




(z») = (z) n ; 






Zl 

Z2 


l f i|. 
|z 2 |' 


(Zl + Z2 + • 


■ ■ +z n ) 


= Zl + Z 2 + • • 



Z1Z2 • ■ • Z« = Z1Z2 • • • z n . 

We now utilize some of these simple properties of the complex conjugate 
operation to prove an important theorem concerning the roots of a poly- 
nomial, and shall then deduce three very useful corollaries. In the process of 
doing so, we shall take as self-evident the fact that a polynomial P(z) of 
degree n has n factors of the form (z — £). These are called linear factors 
because they are of degree 1. The numbers £ may, or may not, be complex. 

T h E o R E m 4- 1 If the «th degree polynomial 

P(z) = a z n + fliz"" 1 + - ■ • + a„ 

has its coefficients a , a\, . . ., a n real, then if z = £ is a zero of P(z), so also 
is z = t, a zero of P{z). 

Proof Suppose that z = £ is a zero of P(z). Then by definition 

flo£" + ai^- 1 + ■ ■ ■ + a n = 0. 
Hence, taking the complex conjugate of this equation we may write 



(ao£" + fli^" 1 + ■ ■ ■ + C) = 0. 



However, the complex conjugate of a sum is the sum of the complex con- 
jugates of the individual terms comprising the sum so that 



(a £» + fli£»- x + ■ ■ ■ +a n ) = «„£« + ai^" 1 + ■ • ■ + a n . 
Now as the a r , r = 0, 1, . . ., n are real, it follows that a r = a r and so 



a r r,«-r = a r tn-r = a r (£)n- r> for r = 0, 1, . . ., «. 
Hence, 

a<& + fli^" 1 + • • • + a n = 0; 



122 / COMPLEX NUMBERS AND VECTORS CH 4 

showing that P{1) = 0. Thus z = \ is also a zero of P(z). 

Paraphrased, Theorem 4-1 asserts that if a polynomial with real coefficients 
has complex zeros, then they must occur in complex conjugate pairs. 

As any zero which is not complex must be real, it follows that we may 
formulate a Corollary to Theorem 4- 1 . 

Corollary 4-1 (a) If a polynomial has real coefficients, then those of its 
zeros that are not real, occur in complex conjugate pairs. 

If z = £ and z = £ represent any pair of complete conjugate zeros in 
Theorem 4- 1 , then (z — and (z — £~) must both be factors of P(z). Hence their 
product (z — £)(z — £) must also be a factor. Now 

(z - o(z - d = z 2 - a + c> + a, 

and as t, + \ = 2 Re £ is a real number and ££ = | £| 2 is also a real number, 
it follows that the pair of complex conjugate zeros correspond to a single 
quadratic factor with real coefficients. Hence Corollary 4-1 (a) may be 
re-phrased thus : 

Corollary 4-1 (b) Any polynomial with real coefficients may always be 
factorized into a set of factors which are linear or at most quadratic, each of 
which has real coefficients. Specifically, if the polynomial is of degree n and 
there are m pairs of complex conjugate zeros, then there will be (« — 2m) 
linear factors with real coefficients and m quadratic factors with real 
coefficients. 

Finally, as an obvious consequence of this last corollary: 

Corollary 4-1 (c) An odd degree polynomial with real coefficients must 
have at least one real zero. 

The significance of these results is best illustrated by an example which 
shows how they may often be used to simplify a difficult problem to the 
point at which the solution may be determined by familiar methods. 

Example 4-6 A polynomial P(z) of degree 5 is defined by the relationship 

P(z) = z 5 + 5z 4 + 10z 3 + 10z 2 + 9z + 5. 

Given that z = i is a zero, deduce the remaining four zeros and use the 
result to express P(z) as the simplest possible product of factors having real 
coefficients. 

Solution First, as the coefficients of P(z) are all real, Theorem 4- 1 is applic- 



SEC 4-3 COMPLEX NUMBERS AS VECTORS / 123 

able. Hence if ?. — ns a zero, then so also is z = —i. Thus (z — /) and (z + i) 
are factors, as is their product (z — i)(z + i) = z 2 + 1 . Using ordinary 
long division to divide P(z) by (z 2 + 1) we find that 

P(z)l(z 2 + 1) = z 3 + 5z 2 + 9z + 5. 

Hence to find the remaining factors we must now factorize this cubic poly- 
nomial. As the degree is odd, and the coefficients are real, Corollary 4-1 (c) 
applies showing that it must have at least one real zero. At this point we have 
recourse to trial and error to find the real zero which for the purposes of this 
example has been made an integer. 

Thus, setting 

Q(z) = z3 + 5z 2 + 9z + 5, 

we must find a value z = z\ such that Q(zi) = 0. By inspection we see that 
<2(— 1) = showing that the real zero is z = —1. This corresponds to the 
linear factor with real coefficients (z + 1). Removing the factor (z + 1) 
from the cubic by long division, we then find that 

= = z l + 4z + 5. 



(z 2 + l)(z +1) (z + 1) 

Finally we apply the standard formula for the roots of a quadratic to this 
expression to obtain the remaining two zeros. Completing the calculation, 
these are found to be z = —2 — i and z = —2 + ;'. Thus the five zeros are 
z = i, z = —i, z = — 1, z = — 2 — /, and z = — 2 + i. The required 
factorization is 

P(z) = (z + l)(z 2 + l)(z 2 + 4z + 5). 



4-3 Complex numbers as vectors 

So far we have discussed the basic arithmetic of complex numbers but have 
not mentioned their vector properties. To do this, and to give a geometrical 
representation of complex numbers, we plot them as points in a plane called 
the complex plane or, sometimes, the z-plane. Specifically, we shall use the 
real part of the complex number as its horizontal or x-coordinate and the 
imaginary part of the complex number as its vertical or j-coordinate. Thus 
to each complex number there corresponds just one point in the complex 
plane and, conversely, to each point in the complex plane there corresponds 
just one complex number. The relationship between points and complex 
numbers is one-one. In the complex plane, the x-axis is the real axis and the 
j-axis is the imaginary axis. Other accounts of this subject often refer to 
this geometrical representation of complex numbers as the Argand diagram, in 
honour of its inventor. 



124 / COMPLEX NUMBERS AND VECTORS 



CH 4 



'y Complex-plane 



-2 -1 

z=-l-ii 



• !••: 



(a) 



y Complex-plane 



2- 



l»: = i 



f = 24 /• 



2=1 



■ = 2-iV 




(b) 



Fig. 41 Representation of complex numbers: (a) point representation; (b) vector 
representation. 



In the complex plane, a complex number may either be considered as a 
point in the plane or, equivalently, as the directed straight line element from 
the origin to the point in question. We shall remember this dual relationship 
between points and vectors but, for simplicity, will usually speak only of 
points in the complex plane. 

This duality between points and vectors is indicated in Fig. 4-1 where the 
complex numbers z = 1 , z = i, z = 2 + /, z = 2 — /, and z = — 1 — \i 
have been represented as points (Fig. 4-1 (a)) and as vectors (Fig. 4-1 (b)). 
In the case of the vector representation, arrows have been added to show that 
the vector is drawn from the origin to the point in question. 

Notice that if a number, together with its complex conjugate, are plotted 
in the complex plane, as for example 2 + i and 2 — i'm Fig. 4- 1 (a) and (b), 
then geometrically, in both the point and the vector representations, one is 
obtainable from the other by reflection in the *-axis as though it were a 
mirror. 

Instead of adding and subtracting vectors analytically by use of Definitions 
4-2 and 4-3, the same result may be achieved entirely geometrically as we now 
indicate. Consider the sum of the vectors z\ = 2 + i and zz = 1 + 2/. 
Analytically z\ + z 2 = 3 + 3i, and Fig. 4-2 (a) shows this result. The same 
result may be obtained geometrically by the following construction. If we 
wish to add vector z% to z\, then for the purposes of addition we shall imagine 
vector Z2 to be freed from the origin, so that it is capable of translation any- 



SEC 4-3 



COMPLEX NUMBERS AS VECTORS / 125 




Fig. 4-2 Algebraic operations with complex numbers: (a) vector addition: zi + Z2; 
(b) vector subtraction: z\ — z-i. 

where in the complex plane, but we shall assume that wherever we re-locate 
it in the complex plane it will always be kept parallel to its original position, 
and its length and sense will be preserved. The result of adding z<i to z\ is 
then achieved by translating z% in the manner described until its origin is 
located at the tip of vector z\. The two arrows of vectors z\ and z% then point 
in the same direction, and the vector z\ + z<l is the line element directed from 
the origin to the tip of the vector z<l in its new position. Tn Fig. 4-2 (a) this 
construction is represented by the lower triangle comprising the parallelogram. 
Such triangles are vector triangles. 

A vector not attached to a specific origin or one which, for the purposes 
of combination with another vector, is freed from its origin to be re-located in 
some other part of the complex plane will be called a. free vector. This is in 
contrast to a vector that is attached to a definite origin which we shall call a 
bound vector. In the addition of z<t to z\ that we have just performed, z\ was 
regarded as a bound vector and Z2 as a free vector. 

Notice that by the same argument, z\ may be freed and its origin trans- 
lated to the tip of the bound vector z% to form the vector z 2 + z\, which is 
the line element directed from the origin to the tip of vector z\ in its new 
position. In Fig. 4-2 (a) this construction is represented by the upper triangle 
comprising the parallelogram. The fact that both constructions give rise to 
the same line representing on the one hand z\ + z%, and on the other z% + z\, 



126 / COMPLEX NUMBERS AND VECTORS 



CH 4 



proves that vector addition is commutative, since z\ + z 2 = z 2 + z\. 

Before proceeding with the discussion of subtraction, we first observe that 
Definition 4-4 implies that multiplication of the bound vector z by — 1 
reverses its direction. That is to say its origin remains fixed, but the line 
element representing the vector is rotated about the origin through the angle 
77. With this remark in mind we see that subtraction of vector z 2 from z\ 
(Fig. 4-2 (b)), is just a special case of addition in which the vector to be added 
is — Z2. The vector — z 2 is obtained from z 2 by reversing the direction of z 2 , 
as is indicated in Fig. 4-2 (b) by the dotted line directed into the fourth 
quadrant. The vector z\ — z 2 is then the line element directed from the origin 
to the tip of the reversed vector z 2 in its new position. In Fig. 4-2 (b) this 
construction is shown in the right-hand half of the plane. The same construc- 
tion, with the roles of z\ and z 2 interchanged, is shown in the left-hand half 
of the plane and when compared with the first result proves that z\ — z 2 
= -(z 2 -zi). (Why?) 

Thus far, complex numbers have been seen to obey the addition, multi- 
plication, and distributive axioms of real numbers, and the reader might be 
forgiven for wondering if there is any significant difference between them and 
the real numbers. The answer is yes. Whereas real numbers can be given a 
natural order according to their size, complex numbers cannot. A glance at 
Fig. 4-1 (b) makes it clear that no natural order exists in the field of complex 
numbers, comprising all numbers in real-imaginary form, since even vectors 
of the same length may be differently directed, for instance the pairs of vectors 
1 and /, and 2 + / and 2 — /. Whereas it makes sense to order the lengths 
of vectors, since these are scalar quantities and may be so ordered, the vectors 
themselves have no natural order. To further our argument we now name the 
length of a vector and introduce a notation whereby it may be manipulated 
in equations. 




Fig. 43 Modulus and argument representation. 



SEC 4-3 COMPLEX NUMBERS AS VECTORS / 127 

definition 4-7 (modulus of a vector) The quantity 

\ Z \ =( X 2 +J , 2)1/2 

is called the modulus of the vector z = x + iy. It is the length of the line 
element drawn from the origin to the point (x, y) in the complex plane (see 
Fig. 4-3). 

Example 4-7 If z = 3 + 4/, then \z\ = (3 2 + 4 2 ) 1 / 2 = 5. 

Notice that in the special case Im z = 0, \z\ reduces to the absolute 
value of a real number since, as always, the positive square root is involved 
in the definition. The following useful results are easily verified: 

zz = |z| 2 ; |ziz 2 | = |zi| . |z 2 |. 

If either the upper or lower triangles comprising the parallelogram in 
Fig. 4-2 (a) are considered, then clearly, when expressed in terms of the 
modulus, the Euclidean theorem 'the sum of the lengths of any two sides of 
a triangle exceeds the length of the third side' becomes the following inequality 
relating moduli : 

|zi| + N > |zi + z 2 |. (4-11) 

Equality will occur only when z\ and z 2 are collinear. For obvious reasons 
Eqn (4-11) is called the triangle inequality, and it has already been encountered 
in simple form when we discussed the absolute value of the sum of two real 
numbers. An analytic proof of result (4-11) is set as a problem at the end of the 
chapter. 

Another useful inequality relating the moduli of the complex numbers 
zi and z 2 is 

|zi + z 2 | > ||zi| - |z 2 ||, (4-12) 

where again equality occurs only when z\ and z 2 are collinear. The proof of 
this is also left to the reader as a problem. 

Example 4-8 If z\ — 3 + 4/ and z 2 = 4 + 3/, then z\ + z 2 = 7 + 7/. 
Hence |zi| = (3 2 + 4 2 )i' 2 = 5, |z 2 | = (4 2 + 3 2 ) 1 ' 2 = 5, and |zi + z 2 | 
= (7 2 + 7 2 ) 1 '' 2 = a/98, so that \zi\ + \z 2 \ = 10 and \\zi\ - \z 2 \\ = 0. We 
have thus verified inequalities (4-11) and (4T2) in this special case, for they 
demand that for any z\ and z 2 

IN - Nl < |zi + z 2 | < N + |z 2 | 

which in this case corresponds to the valid inequality 
< V98 < 10. 



128 / COMPLEX NUMBERS AND VECTORS CH 4 

4 -4 Modulus-argument form of complex numbers 

Referring again to Fig. 4-3, we see that the complex number z need not be 
specified in the standard form for it may equally well be specified by giving 
both the value of \z\ and the angle 6 which, by convention, is always measured 
positively in an anti-clockwise direction from the x-axis to the line of the 
vector z. The angle 6 is the argument of z and we shall write 6 = arg z. The 
argument of z is indeterminate with respect to multiples of 277, because angles 
6 and 6 + 2kir, where k is any integer, will give rise to the same line on 
Fig. 4-3. Later we shall see that this indeterminacy in 6 plays an important 
role in the determination of the roots of complex numbers. When 6 = arg z 
is restricted to the interval — n < 6 < n, it will be termed the principal value 
of the argument. 

If we define the real number r by the equation r = |z|, and still set 
6 = arg z, then the ordered number pair (r, 6) describes the polar coordinates 
of the point z in Fig. 43. That is, the radial distance of a point from the 
origin together with its bearing measured from a fixed line through the 
origin. The relationship between the Cartesian coordinates (x,y) and the 
polar coordinates (r, 6) of the same complex number z is immediate, since 
from Fig. 4-3 we have 

x = r cos 6 y = r sin 8 (4T3) 

or, equivalently, 

r = (x 2 + V 2 ) 172 cos 6 = - sin = - (4-14) 

Thus the complex number, or vector, z = x + iy may also be written in the 
modulus-argument form 

z = /-(cos d + i sin 6). (4T5) 

Because arg z is indeterminate up to an angle 2krr, we must phrase our 
definition of equality between two complex numbers carefully when it is to 
refer to complex numbers expressed in modulus-argument form. 

definition 4-8 The two numbers z\ = r(cos d + i sin 6) and z 2 =-p(cos </> 
+ i sin <f>) expressed in modulus-argument form will be said to be equal if, 
and only if, r = p and 6 = <f> + 2A?n-. 

Equations (4T3) and (4T4) enable immediate interchange between the 
modulus-argument and the real-imaginary forms of z, as the following 
examples indicate. 

Example 4-9 

(a) Express z = — 4V3 + 4/ in modulus-argument form; 



SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 129 

(b) Express z = 2 + 5un modulus-argument form; 

(c) If \z\ = 3 and arg z = — 7t/10, express z in real-imaginary form. 

Solution (a) From Eqn (4-14), r = \z\ = [(-4V3) 2 + 4 2 ] 1/2 = 8, whilst 
cos = — (4V3)/8 = — (V3)/2 and sin = 4/8 = J, from which we deduce 
that the principal value of must lie in the second quadrant with = arg z 
= 577/6. Hence, in modulus-argument form 



/ 577 . 5n\ 

^ cos _ + i - sln _j. 



Notice that although we could have written 6 = arg z = arc tan (—l/\/3), 
it would not then have been clear in which quadrant 8 must lie, and, conse- 
quently, we shall always specify sin and cos separately. 

Solution (b) Again from Eqn (4-14), r = \z\ = (2 2 + 5 2 ) 1/2 = V29, whilst 
this time cos = 2/V29 and sin 8 = 5/^29, from which we deduce that 
the principal value of 8 must lie in the first quadrant with 6 = arg z = 1-1903 
rad. Hence, in modulus-argument form 

z = V29(cos 1-1903 + /sin 1-1903). 
Solution (c) The result is immediate, since Eqn (4- 1 5) gives 
z = 3 {cos(-f ) + /sin(-f o )j 

= 2-8533 - 0-9270/. 

We now examine the consequences of multiplication and division for 
complex numbers expressed in modulus-argument form. Let z\ and zi be 
the two complex numbers : 

z x = n(cos 0i + i sin 61) and z% — rz{cos 62 + i sin 92). (4-16) 

Then by direct multiplication we find that 

z lZ2 = n/"2[(cos 61 cos 62 — sin 61 sin 62) 

+ /(sin 81 cos 82 + cos 61 sin 82)], 

and using the trigonometric identities for cos (0i + 2 ) and sin (0i + 6 2 ) 
this may be written as 

ziz 2 = rir 2 [cos (0i + 2 ) + i sin (0i + 2 )]. (4-17) 

We have thus proved that the result of the product ziz 2 is a complex number 
with modulus |ziZ2| = /V2 and argument arg (ziz 2 ) = 0i + 2 = arg zi 
+ arg Z2. Thus the result of multiplying two complex numbers is to produce 
a complex number whose modulus is the product of the two separate moduli 



130 / COMPLEX NUMBERS AND VECTORS 



CH 4 



and whose argument is the sum of the two separate arguments (see Fig. 4-4). 
A special case results if we write 



/ = cos \n + i sin \tt. 



(4-18) 



It follows that in the z-plane, multiplication by /' corresponds geometrically 
to an anti-clockwise rotation through \n without any change of size. To 
illustrate this, the vectors iz\ and /z 2 have been added to Fig. 4-4. 




Fig. 4-4 Multiplication and division; ziz2, zi/22. 

By repeated application of Eqn (4-17) it is easily proved that if z m 
= r„»(cos m + i sin m ) for m = 1, 2, . . ., n, then 

ziz 2 • ■ • z„ = r\r% ■ • • r n [cos (0i + 2 + • • • + 0») 

+ isin(fli + fl a + • ■ • + e„)]. (4-19) 

An argument essentially similar to that which gave rise to Eqn (4-17), 
but this time using the trigonometric identities for cos (61 — 62) and 
sin (0i — 02), establishes that whenever z 2 =£ 0, then with the same notation 
we have 



— = - [cos (0i - 2 ) + i sin (0i - 2 )]. 
Z2 n 



(4-20) 



Obviously |zi/z2| = nfn = \z1\l\z2\ and arg(zi/z 2 ) = 0i — 2 = argzi 
— arg 22. Expressed in words, this says that the result of dividing two complex 
numbers is to produce a complex number whose modulus is the quotient of 
the separate moduli and whose argument is the difference of the two separate 
arguments. 

A most important special case of Eqn (4-19) occurs when all the 21, 22, 
. . ., z n are equal to the same complex number 2 = r(cos 8 + i sin 0), say. 
The result then becomes 

z n — r «( cos n Q + / s in nd). 



SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 131 

Substituting for z and cancelling a real factor r n , we obtain the following 
important theorem. 

theorem 4-2 (de Moivre's Theorem) 
(cos 6 + i sin 6) n = cos nd + / sin nd. 

A more subtle argument would have yielded the fact that this remarkable 
result is true for all real values of n, and not just for the integral values 
utilized in our proof. This will be undertaken later when the complex exponen- 
tial function has been discussed. 

Theorem 4-2 provides a simple method by which certain forms of trigo- 
nometric identity may be established. One typical example is enough to 
illustrate this. 

Example 4-10 Let us relate sin 46 and cos 46 to sums of powers of sin 6 
and cos 6. Set n = 4 in Theorem 4-2 and expand the left-hand side by the 
binomial theorem, using the fact that i 2 = — 1, p = —i, j 4 = l, etc., to 
obtain 

cos 4 6 + 4/ cos 3 6 sin d - 6 cos 2 6 sin 2 6 - 4i cos 6 sin 3 6 + sin 4 6 

= cos 46 + / sin 46. 

Then, recalling that equality of complex numbers means equality of their 
real and imaginary parts considered separately, we have the two results: 

equality of real parts 

cos 4 6-6 cos 2 6 sin 2 6 + sin 4 6 = cos 46, 

and 

equality of imaginary parts 

4(cos 3 6 sin 6 - cos 6 sin 3 6) = sin 46. 

These are the desired results. It is characteristic of complex numbers that any 
single complex equality implies two real equalities, and even if only one is 
sought the other will be generated automatically. The same method works 
for any positive integral value of n when it will connect sin nd and cos «0 
with sums of powers of sin 6 and cos 6. 

We shall return to this idea in connection with the exponential function, 
and show that it is possible to use de Moivre's theorem to express sin" 6 
and cos" 6 in terms of sums involving sin rd and cos r6. 

Sometimes Theorem 4-2 can be used to reduce the labour of computation 
as now shown. 

Example 4-11 We shall evaluate z 10 where z = 1 + i. Rather than making 



132 / COMPLEX NUMBERS AND VECTORS CH 4 

repeated multiplications, or applying the binomial theorem, we write z in 
modulus-argument form as z = \/2(cos n/4 + i sin tt/4), when we have 
z 10 = ( A /2) 10 (cos n/4 + i sin tt/4) 10 . By de Moivre's theorem this becomes 



7 io 



2 5 I cos — + / sin — I = 32/'. 



4-5 Roots of complex numbers 

When performing algebra on real numbers the idea of the root of a number 
plays a fundamental part. The same is true when manipulating complex 
numbers, and we now discuss the general ideas involved in determining their 
roots. 

Let p/q be any rational number, where/? and q are integers with q supposed 
positive. We shall assume that/j and q have no common factor. 

definition 4-9 We define zvlv by saying that: 

W = ZP'9 o W'« = ZP. 

Let 

w = p(cos <f> + i sin <f>) and z = r(cos B + i sin 6). (4-21) 

Then from Definition 4-9 and de Moivre's theorem we have 

p<?(cos q<f> + i sin q<j>) = rJ>(cos/>0 + /' sin pB). (4-22) 

Now from Definition 4-8 it follows that 

p q = r v and q<j>=pd + lk-n, (4-23) 

and so 

. p6 + lk-n 

p = r viQ and 6 = C4-24) 

q 

The expressions w = zv'i thus have the general form 

(p6 + 2k7T\ . . (pB + lk-n 



w = zp'9 = rP l( i 



with k an integer. (4-25) 



cos 

1 



It is easily seen that only q different values w , w\, w%, . . ., w g -i of w will 
result from Eqn (4-25) as the integer k increases through successive integral 
values. It is usual to give k the q successive values k = 0, 1, 2, . . ., q — 1. 
If k is allowed to increase beyond the value q — 1, then the numbers w , w\, 
. . ., w a -i will simply be generated again because of the periodicity properties 
of the sine and cosine functions. 

Example 4-12 We illustrate the use of Eqn (4-25) by determining the n 
numbers w satisfying the equation w = (l) 1 '". For obvious reasons these are 



SEC 4-5 



ROOTS OF COMPLEX NUMBERS / 133 



called the nth roots of unity. Comparing this equation with the general expres- 
sion w = z^ii that has just been discussed we see that we must make the 
identifications z = 1, p = 1, and q = n. To proceed further we must write 
the number unity in its modulus-argument form 

1 = 1. (cos + / sin 0), 

so that comparing this with z in Eqn (4-21) we see that the further identifica- 
tions r = 1 and = must be made. Substitution of these quantities into 
Eqn (4-25) then gives the result 



2krr . 2k-n 

w'jt = cos h / sin 

n n 



with k = 0, 1, 2, 



1. 



The result of this calculation with n = 5, for example, is to generate the 
fifth roots of unity. In Fig. 4-5 these roots are plotted as the numbers wq, w\, 
. . ., W4 in the complex plane. They are uniformly distributed around the 
unit circle centred on the origin. By making use of the vector properties of 
complex numbers we shall usually represent this circle by the convenient 
notation \z\ = 1. (Why?) 




Fig. 4-5 Fifth roots of unity. 



Fig. 4-6 Roots of co = (1 + 2 ' 3 - 



Example 4-13 As a slightly more general example we now determine z 2 / 3 , 
when z = 1 + /. In this case/? = 2, q = 3, and in modulus-argument form, 
z = \/2(cos 77/4 + / sin tt/4) showing that r = -y/2 and 6 = w/4. Substitution 
into Eqn (4-25) gives 



134 / COMPLEX NUMBERS AND VECTORS 



CH 4 



W = 2 1 " 



COS 



with k = 0, 1, 2. 



/I +4AA /l +4Jfe\ "I 

l-6-j ,r + l,U,, i-6-j w . 

The three roots wo, wi, and W2 are thus : 

(k = 0): w> = 2i/3 /cos £ + i sin g) = 2i« (^ + -), 

(k = 1): wi = 2i'» (cos ^ + ''an?) = 2W- ^ + Ij, 

(k = 2): w 2 = 2 1/3 /cos ^ + jsin y) = -2i'*i. 

These are plotted in the complex plane in Fig. 4-6, where they are seen to be 
uniformly distributed around the circle \z\ = 2 1/3 . 

Example 4-14 As a final example let us find the roots of the equation 

w = i~ 113 . 

In terms of the notation of Eqns (4-21) and (4-25), and recalling that we have 
agreed always to take q as positive, we have p = — 1, q = 3, and z = i. 
Now in modulus-argument form 



'-•( 



77 . TT\ 

cos - + i sin - 
2 2 



so that r = 1 and 8 = tt/2. Hence, substituting into Eqn (4-25),. we find that 



w = cos 



"(-77/2) + Ikli 



+ /sin 



\-7TJ2) + Ikrr- 



with k = 0, 1, 2. 



Hence the three roots wo, wi, and W2 are : 

(k = 0) : H'o = (cos tt/6 — i sin tt/6) = J( V3 — 0, 

(& = 1): w\ = (costt/2 + /sin7r/2) = /, 

(A: = 2): H-2 = (cos 7tt/6 + /sin 7tt/6) = -J(V3 + «')• 

This completes our preliminary encounter with complex numbers, and 
our study will be resumed later in connection with the complex exponential 
function and with functions of a complex variable. The remainder of this 
chapter is devoted to developing the foundations of our study of general 
vectors. 

4-6 Introduction to space vectors 

It is clear that any set of vector quantities that do not all lie in a plane cannot 
be represented vectorially in the form of complex numbers. For example, 



SEC 4-6 



INTRODUCTION TO SPACE VECTORS / 135 



even the vectors describing the velocity of a vehicle as it is driven at constant 
speed past fixed points on a winding hill could not be so represented. Pair- 
wise these velocity vectors define planes, and so could be represented by 
complex numbers in those planes, though different pairs of vectors would 
define different planes, thereby making any general representation impossible 
in terms of complex numbers. The trouble here is not hard to find. It is that 
complex numbers just happen to be capable of representation as planar 
vectors with their own appropriate descriptive language, and they were not 
developed with general vector representation in mind. In short, they are 
complex numbers first and vectors second; not the other way around. 

To overcome this limitation and to be able to describe arbitrary vector 
quantities we must preserve the idea of a vector as a directed length, but 
re-think its description. This is best achieved using a diagram, so consider 
Fig. 4-7 which depicts the mutually perpendicular Cartesian axes 0{x, y, z} 
with origin O. In more mathematical terms we describe these axes as being 
mutually orthogonal. This is a technical term that in a geometrical context 
has the same meaning as perpendicular, though it is often used in a wider 
sense, when the word perpendicular would be inappropriate. Henceforth we 
shall almost always use the term orthogonal. 

The manner of identification of the x, y, and z coordinate axes is not 



C_3 



£(0, <>.<»,) 



t 









P("V°2,",) 






,<& 










"V*Jr 










*s 


i 


i 




o, 




H0,o r 0) 




^ 






\ 
\ 


fi(fl,,fl 2 ,0) 


y 



Fig. 4-7 Right-handed Cartesian axes. 



A 6 "•- fit o +- o <■-■ 
0^ '• OA x ^ ^ 



136 / COMPLEX NUMBERS AND VECTORS CH 4 

arbitrary, but is made in such a manner that they form a right-handed system 
of axes. By this we mean that having assigned axes for the variaWes-nrnTT y, 
together with the directions in which they increase positively, the direction of 
positive z is then chosen to be that in which a right-handed screw would 
advance were it aligned with the third axis and rotated in the sense x to y. 
This sense of rotation is indicated in Fig. 4-7 by means of a directed spiral 
about the z-axis. In the diagram the y- and z-axes are supposed to lie in the 
plane of the paper with the .v-axis pointing out of the paper towards the 
viewer. Later we shall refer to this right-handed property in connection with 
axes which are not orthogonal, when right-handedness is still to be interpreted 
in exactly the same sense as above. 

This right-handed property of the system of axes is shared by each pair of 
axes in turn, provided the senses of rotation are appropriately defined. The 
following table describes the convention that is always adopted. 

Table 41 Right-handed axes 

Rotate R-H screw advances 

From To in direction of positive 



x y z 

y z x 

z x y 



The table can easily be remembered in the concise form 

x y z 
y z x 

z x y 

where the entry in any row is obtained from the entry in the row above by 
transferring the first letter of that entry to the last position. These entries are 
called cyclic permutations of the letters x,y, and z, and further cyclic permuta- 
tions will simply regenerate the table. These rules describe the right-handed 
symmetry of the 0{x, y, z} axes. If any two letters in an entry are inter- 
changed, then by the same rule, the negative direction of the third axis is 
defined. Hence the set of letters y x z are to be interpreted 'rotate from y to 
x to make a right-handed screw aligned with the z-axis advance in the direction 
of negative z'. 

If in the above argument a right-handed screw motion had been replaced 
by a left-handed screw motion, then a left-handed system of axes would have 
resulted. Although a left-handed system of axes is in all respects equivalent to 
a right-handed system for the purposes of vector representation, it is customary 
to work with right-handed systems. 

Let P be the point with coordinates x = a\, y = a%, and z = a$ illustrated 
in Fig. 4-7. We shall denote it by the more concise notation (a\, a<z, as) where 



SEC 4-6 INTRODUCTION TO SPACE VECTORS / 137 

the first, second, and third entries in this ordered number triple represent the 
.v, )', and z coordinates, respectively. Then from the point of view of coordinate 
geometry it is the point P that is of interest, whereas from the point of view 
of vectors it is the directed line element from O to P that is of interest. To 
signify that it is the vector quantity that interests us here we shall write OP. 
Notice that by this convention the vector PO is the directed line from P to O 
and is opposite in sense to OP. In future we will denote the length of the vector 
OP by |OP|, which is a scalar, and by definition this length will always be 

positive. 

In Fig. 4-7 the lengths OA = a\, OB = a%, and OC = az are called the 
orthogonal projections of OP onto the .y-, y-, and z-axes, and a simple applica- 
tion of Pythagoras' theorem gives the result 

|OP| 2 = (OA) 2 + (OB) 2 + (OC) 2 
or, 

|OP| 2 = ai 2 + a 2 2 + fl3 2 . y 
Dividing by |OP| 2 this becomes 



1 



\|OP|/ + l|OP|/ ^\\OV\) 



which can then be rewritten in terms of the angles 0i, 02, 03 as 

1 = cos 2 01 + cos 2 02 + cos 2 3 . (4-26) 

If the numbers /, m, and n are defined by the relations 

/ = cos 0i, m = cos 02, n = cos 03, (4-27) 

then Eqn (4-26) becomes 

1 = /a + m 2 + „2. ( 4 . 2 8) 

For obvious reasons /, m, and n are called the direction cosines of OP with 
respect to the axes 0{x, y, z) and it is often convenient to write them in the 
form of an ordered number triple as {/, m, n}. The angles 0i, 02, and 83 are 
indeterminate to within a multiple of 2n and, by convention, they will always 
be taken to lie in the interval [0, 77]. 

Consider the direction cosines /, m, n as defining a point P' in space with 
coordinates x = I, y = m, and z = n, then, by Pythagoras' theorem and 
Eqn (4-28), the vector OP' must have unit length. The direction and sense of 
OP' are the same as those of OP; only the lengths are different. Vectors of 
unit length in given directions prove to be extremely useful in vector analysis 
so they are appropriately called unit vectors. 

Now by definition, the direction cosines /, m, n are proportional to the 



138 / COMPLEX NUMBERS AND VECTORS CH 4 

coordinates a\, a%, a$ of the point P and consequently the numbers oi, a-i, 
and «3 are often called the direction ratios of OP. To convert direction ratios 
to direction cosines it is necessary to normalize them by dividing by the 
square root of the sum of the squares of the direction ratios. This is, of course, 
equivalent to division by the quantity we have agreed to denote by |OP|. 

Example 4-15 Find the direc tion ratios, the direction cosines and the angles 
0i, 02, and 03 of the vector OP, where P is the point (1, —2, 4). 

Solution The direction ratios are 1, —2, 4, and |OP|, which is the square 
root of the sum of the squares of the direction ratios, is 

|OP| = (l 2 + (-2) 2 + 4 2 ) 1 ' 2 = V21. 

Hence the direction cosines of OP are /= \j\/2\, m = — 2/-y/21, and 
n = 4/\/21, from which the angles 0i, 02, and 03 are seen to be 1-351, 2-022, 
and 0-509 radians, respectively. Unless otherwise stated we shall always 
express angles in terms of radians, as here. 

Example 4-16 Determine the angles of inclination 0i, 02, and 03 of a vector 
to the x-, y-, and z-axes, respectively, given that its direction cosines are: 

(a){|, -V3/2.0}, 
(b){|,i, VH/4}. 

Solution (a) Here / = cos 0i = 1/2, m = cos 02 = — \/3/2, n = cos 03 
= 0, so that 0i = 7t/3, 02 = 577/6, and 03 = tt/2. Hence in this case the vector 
lies entirely in the (x, j)-plane. 

Solution (b) In this case, / = cos 0i = 1/2, m = cos 02 = 1/4, n = cos 03 
= a/H/4, so that 0i = 77/3, 2 = 1-318, and 3 = 0-593. 

Example 4-17 If a vector has direction cosines {^, m, %} deduce the possible 
values of m. If, in addition, it is stated that the vector makes an obtuse angle 
02 with the j-axis determine the value of 02. 

Solution We use Eqn (4-28), setting / = \ and n = \ to obtain 

(1)2 + W 2 + (|)2 = L 

Whence, w 2 = 1/2 or m = ± 1/V2- These values of m correspond to 
2 = 7T-/4 for m = l/\/2, and to 2 = 3tt/4 for m = -l/\/2. As the angle 2 
is required to be obtuse we must select 02 = 37r/4. 

The idea of a fixed origin is fundamental to coordinate geometry though 
it proves to be rather too restrictive in vector analysis. This is because it is 



SEC 4-6 



INTRODUCTION TO SPACE VECTORS / 139 



only the magnitude, direction, and sense of a vector that usually matter, and 
not the choice of origin and coordinate system in which the vector is repre- 
sented. For example, when specifying a wind velocity it is normally sufficient 
to say 20 ft/s due East, without identifying the particular points in space at 
which the air has this velocity. 

In vector work this ambiguity as to the location of a vector in space is 
allowed by considering as equivalent, any two vectors that may be repre- 




Fig. 4-8 Translation of axes without rotation. 

sented by directed line elements of equal length which are parallel, and have 
the same sense. In Fig. 4-8 we have depicted two vectors OP and O'P' that 
are equivalent in the sense just defined. Another way to definelhis equivalence 
is to require that when the axes 0{x, y, z} are translated, without rotation, to 
the position 0'{x',/,z'}, the coordinates of P' with respect to the axes 
through O' are the same as those of P with respect to the axes through O. 
That is, if P is the point (a u a 2 , fl 3 ) in the system of axes 0{x, y, z}, then P' 
is the point (a u a 2 , a 3 ) in the system of axes 0'{x', /, z'}. Do not get confused 



140 / COMPLEX NUMBERS AND VECTORS 



CH 4 



by this. If O' is the point (ai, 0C2, as) with respect to 0{x, y, z}, then coordi- 
nates in the unprimed system are related to those in the primed system by the 
equations x = oa + x', y — a.i + y' , and z = 0C3 + z'. 

This freedom to translate vectors now enables us to give direction cosines 
to any vector in space and not just to those having their base at O. Suppose, 
for example, that we require the length and direction cosines of the vector AB, 
where A is the point {ay, 02, 03) and B is the point {by, bz, bz) when expressed 
relative to some set of axes 0{x, y, z}. Then we see at once that the lengths 
of the projections of AB on the x, y, and z axes are {by — ay), {b% — 02), 
and (63 — as), respectively. Accordingly, by translating the vector AB until 
A in its new position A' coincides with O, we see that the tip B in its new 



h 




Fig. 4-9 Translation of a vector. 



SEC 4-6 INTRODUCTION TO SPACE VECTORS / 141 

position B' must be the point ((bi — a\), (bi — 02), (bs — #3)) (see Fig. 4-9). 
Hence |AB|, that is the length of AB, is 

|AB| = [(61 - ai)2 + (b 2 - a 2 ) 2 + (bs - as) 2 ] 1 ' 2 . (4-29) 



The direction cosines of AB then follow as before and are 

bi — cti b 2 - a-i b% — a% 

1 = , , „, ' m = , . _ > n = , kn . • (4-30) 

|AB| |AB| |AB| v ' 

Example 418 Find |AB| and the direction cosines of the vector AB, if A 
has coordinates (1, 2, 3) and B the coordinates (4, 3, 6). 

Solution From Eqn (4-29) we see that |AB| = [(4 - l) 2 + (3 - 2) 2 
+ (6 - 3) 2 ] 1/2 = a/19, whilst from Eqn (4-30)~¥ follows that / = -3/V19, 
m= 1/V19, and« = 3/V19. 

It is now convenient to introduce a triad of unit vectors, denoted by i, j, 
and k, that are parallel to and are directed in the positive senses of the x-, y-, 
and z-axes, respectively. Here we remind the reader that these are called unit 
vectors because they are each of unit length on the x-, y-, and z-length scales. 
Notice that the term right-handed that was applied to the system of axes 
0{x, y, z} also applies to the triad of vectors i, j, k when taken in this order. 
We shall use this idea again later. 

An arbitrary vector in any one of the i, j, or k directions may then be 
obtained by scaling the length of the appropriate unit vector by a multiplica- 
tion factor fi. Thus a vector three times the size of the unit vector i will be 
written 3i, whilst a vector twice the size of the unit vector k, but oppositely 
directed, will be written —2k. 

Returning to Fig. 4-7 we see that in terms of i, j, and k, the vectors OA, 
OB, and OC may be written as 

OA = ail, OB = a 2 j, OC = ask. 

From our ideas of vector addition in a plane the vector OQ lying in the 
(x, j»)-plane is OQ = OA + AQ or, because vectors may be translated, 
OQ = OA + AB. Now in terms of our unit vector notation this may be 
written OQ = a\i + a 2 j. Turning attention to the plane containing points O, 
Q, and P, we see that by the same argument OP = OQ + QP. Again, 
because vectors may be translated, QP = OC so that finally, on substituting 
for OQ and QP in the equation OP = OQ~+ QP, we obtain 

OP = aii + a 2 j + tf 3 k. (4-31) 

For ease of notation, arbitrary vectors, like unit vectors, will usually be 



142 / COMPLEX NUMBERS AND VECTORS CH 4 

denoted by a single symbol such as a, a, or r. Thus a general point P in space 
with coordinates (x, y, z) will often be written 

r = xi + y\ + zk. (4-32) 

The almost universally accepted convention which we adopt here is to denote 
vector quantities by bold face type and scalar quantities by italic type. 

Because a vector such as that in Eqn (4-32) identifies a point P in space 
itiscaWedapositionvector. In the vector representation Eqn (4-31) the numbers 
ci, «2, and as are called the components of OP. 

Two vectors will only be said to be equal if, when written in the form 
of Eqn (4-31), their corresponding components are equal. The vector a 
= aii + fl2J + «3k will be said to be a scalar multiple A of vector b 
= bii + foj + fok, and we will write a = /lb if, and only if, ai — Xb±, 
#2 = A62, and #3 = A63. In the special case X = — 1 we have a = — b, 
showing that |a| = |b|, but that the senses of a and b are opposite. Thus in 
Fig. 4-7 we have OP = -PO. 

The zero or null vector is the vector whose three components are each 
identically zero. It is often denoted by O instead of 0, since confusion is 
unlikely to arise on account of this simplification of the notation. Following 
on from our first ideas of vectors, and in accordance with the derivation of 
Eqn (4-31), we now define the operations of addition and subtraction of 
vectors. 

definition 4T0 Let a and b be arbitrary vectors with components 
(ai, at, a$) and (bi, 62, 63), respectively, so that they may be written 
a = aii + 02} + 03k and b = bii + bz\ + 63k. Then we define the sum 
a + b of the two vectors a and b to be the vector (ai + b{)i + (a% + b^)'} 
+ (03 + &3)k. The difference a — b of the two vectors a and b is defined to 
be the vector (ai — bi)i + (02 — b^)} + (az — bz)k. 

Because real numbers are commutative with respect to addition, it follows 
directly from this definition that the operation of vector addition is commuta- 
tive. That is we have a + b = b + a for all vectors a and b. When the sub- 
traction operation is considered the properties of real numbers imply the 
result a — b = — (b — a) for all vectors a and b. 

Example 419 If a = i + j + 2k and b = 3i - 3j + k, then a + b 
= (1 + 3)i + (1 - 3)j + (2 + l)k, showing that a + b = 4i - 2j + 3k. 
Reversal of the order of the sum followed by the same argument proves the 
commutative property a -f b = b + a for these particular vectors. In the 
case of subtraction we have a — b = (1 — 3)i + (1 — (— 3))j + (2 — l)k, 
showing that a — b = — 2i + 4j,+ k. It is easily established that a — b 
= -(b - a). 



SEC 4-6 



INTRODUCTION TO SPACE VECTORS / 143 



Although these particular results could be illustrated diagrammatically, 
the vector triangles involved would look essentially the same as those used 
earlier in connection with addition and subtraction of complex numbers and 
would be arrived at by the same reasoning. Rather than illustrate this specific 
case, we present in Fig. 4-10 the results of addition and subtraction of arbitrary 
vectors a and b. Because a geometrical projection method is necessary to 
illustrate three-dimensional problems on a sheet of paper, such diagrams are 
much less useful as a tool than was the case in a plane. Accordingly, we shall 
usually concentrate on an analytical approach to vectors, using diagrams 




4W- -b 



Fig. 4 10 Addition and subtraction of vectors. 



only when they seem likely to be helpful. 

Two terms worthy of note that are applied to vectors are the names 
parallel and anti-parallel. Two vectors will be said to be parallel when their 
lines of action are parallel and their senses are the same. Conversely, two 
vectors will be said to be anti-parallel when their lines of action are parallel 
but their senses are opposite. Thus if a is a vector and /j, is a scalar, the vectors 
a and fin are parallel if> > and are anti-parallel if> < 0. It follows that 
two vectors will be parallel if their corresponding direction cosines are equal 
and they will be anti-parallel if their corresponding direction cosines are equal 
in magnitude but opposite in sign. 

Example 4-20 The vectors a = i + 2j - 4k and b = 3i + 6j - 12k are 

such that we may write 3a = b. Since the scalar 3 > it follows that a and 
b are parallel. However the vectors c = i — 3j + k and d = — 2i + 6j — 2k 



144 / COMPLEX NUMBERS AND VECTORS 



CH 4 




Fig. 411 Position vectors defining the vector AB. 

are such that we may write —2c = d and, as the scalar —2 < 0, it follows 
that c and d are anti-parallel. By the same argument, the two vectors 
p = 3i — j + 2k and q = 6i + 2j + 4k are neither parallel nor anti-parallel, 
since for no scalar /u is it true that ftp = q. 

The length of the vector AB which we have already denoted by |AB| 
is a useful quantity and, as with complex numbers, is called the modulus of 
the vector AB. Its formal definition follows. 

definition 4-11 The modulus |a| of the vector a = aii + d2) + 03k is 
the positive square root 

|a| = fa* + a 2 2 + as 2 ) 1 ' 2 . 

It is an immediate consequence of this definition that any vector r with 
direction cosines {/, m, n) may be written in the form 



r = |rl(/i + m\ + nk). 



(4-33) 



The proof of this is obvious for by definition, /|r| is the x-component of r, 
w|r I is the j-component, and «|r| is the z-component. The form of Eqn (4-33) 
shows that any vector may be expressed as the product of a scalar (its modulus) 
and a unit vector defining its direction and sense. 



SEC 4 '6 INTRODUCTION TO SPACE VECTORS / 145 

When it is necessary to define an arbitrary vector AB in space, this may 
easily be accomplished by using position vectors a and b to identify its end 
points A and B. This is illustrated in Fig. 4-11 from which, by the rules of 
vector addition, we may write 

OA + AB = OB 
or, 

AB = OB - OA = b - a. 

Examination of this simple but useful result suggests that an accurate name 
for the vector AB would be the 'position vector of B relative to A', since in 
this role it is A that plays the part of the origin. This more exact name is 
seldom used since the symbol AB is sufficiently clear as it stands. 

Example 4-21 Let points A and B be identified by the position vectors 
a = ~2i - 3j + k and b = 3i - j + 4k, respectively. Find the vector AB 
together with its modulus and direction cosines. 

Solution The diagram in Fig. 4-11 can be taken to represent this situation 
showing that vector AB = b - a. Substituting for the values of a and b, 
we find AB = (3i - j + 4k) - (-2i - 3j + k), whence AB = 5i + 2j + 3k. 
Then |AB| = (52 + 2* + 3 2 )^ = V38 after which the usual argument 
establishes that / = 5/V38, m = 2/V38, and n = 3/V38. 

By considering the plane containing the vectors a, b, and b - a in Fig. 
4-11, the arguments that established the triangle inequalities for complex 
numbers also establish them for arbitrary space vectors. Hence for arbitrary 
vectors a and b we have 

||a|-|b||<|a + b|<|a| + |b|. ( 4 . 34) 

Finally, to close this section, let us find the angle between two vectors 
a and b with the direction cosines {h, m u m} and {/ 2 , m 2 , h 2 }, respectively. 
When the lines of action of the vectors intersect the angle 6 is well defined 
and, by convention, is always chosen to lie in the interval [0, n]. If the lines 
of action of two vectors do not intersect then they are merely translated until 
they do, when the angle 6 is defined as above. It will suffice to consider the 
angle between two unit vectors directed along a and b since the length of the 
vectors will obviously not influence the angle between them. From Eqn 
(4-33), these unit vectors are seen to be (hi + wij + mk) and (/ 2 i + m 2 j + « 2 k). 
These are shown in Fig. 4- 12. They have their tips P and Q at the respective 
points (h, mi, m) and (/ 2 , m 2 , « 2 ). 



146 / COMPLEX NUMBERS AND VECTORS 



CH 4 



fW,. m,, «,> 



Q(l v m,, «j) 




Fig. 4-12 Angle between two lines. 

Now, by the cosine rule 

|PQ| 2 = |OP| 2 + |OQ| 2 - 2|OP| . |OQ| cos 6, 



(4-35) 



but | OP | = |OQ| = 1, and by Eqn (4-29), |PQ| 2 = (/ 2 - h) 2 + (m 2 - tm) 2 
+ («2 - m) 2 , whilst by Eqn (4-28), h 2 + mi 2 + m 2 = h 2 + w 2 2 + « 2 2 = 1. 
Consequently, substituting into Eqn (4-35) and simplifying, we find the 
desired result 



cos 6 = hh + m\m<i, + mm. 



(4-36) 



The angle of inclination 6 follows directly from this equation. The restriction 
of the angle between the vectors to the interval [0, tt] means that in Fig. 4- 12, 
it is the angle 6 that is selected, and not the angle 6'. 
As a particular case, if 



/1/2 + /M1W2 + M1H2 = 0, 
then the two vectors a and b must be orthogonal. 



(4-37) 



Example 4-22 Find the angle of inclination 6 between the vectors 
a = i + 2j + 3k and b = 2i — j — k. 

Solution Here |a| = \/14, |b| = \fd, so that the direction cosines {h, m\, m} 
of a are h = 1/V14, wi = 2/V14, «i = 3/^/14 whilst the direction cosines 
{k, mi, n 2 } of b are h = 2/V6, w 2 = — 1/V6, «2 = — 1/\/6. Hence by Eqn 
(4-36), the angle 6 is the solution of the equation 



cos 6 



\Vi4J we) + Ivwlve) + W14/V6/' 



SEC 4-7 SCALAR AND VECTOR PRODUCTS / 147 



or 

6 = arc cos I 1 • 

W21/ 

On account of the restriction of 8 to the interval [0, -n\ it finally follows that 
= 1-905 rad. 

4-7 Scalar and vector products 

If a = aii + az) + 03k is an arbitrary vector and A is a scalar, then we have 
already defined the product Aa to be the vector Aa = Xa\\ + Aa2J + Aa3k. 
Hence the effect of multiplying a vector by a scalar is to magnify the vector 
without changing its direction. The result of this product is to generate a 
vector. We must now discuss the multiplication of two vectors. 

Here three-dimensional vector algebra differs radically from the vector 
algebra of complex numbers. With complex numbers there is only one 
multiplication operation defined, and the product of two complex numbers is 
always a complex number. In the case of vectors we shall see that two multi- 
plication operations are defined for a pair of vectors. One operation called a 
scalar product generates a scalar, whereas the other operation called a vector 
product generates a vector. The operation of division is not defined for vectors. 

The scalar product of two vectors is a generalization of the notion of the 
orthogonal projection of a line element onto another line and is suggested by 
Eqn (4-36). Its definition follows. 

definition 4- 12 The scalar pro duct of the two vectors a = aii -f- at, j + 03k 
and b = bii + 62J + 63k is written a . b and is defined to be the scalar 
quantity 

a . b = aibi + 0262 + 0363. 

Because of the notation used, a scalar product is often colloquially 
called the dot product. Some books favour the notation (a, b) for the scalar 
product when it is then usually called the inner product of vectors a and b. 
To exhibit the relation of a . b to Eqn (4-36) we first divide a . b by the product 
of the moduli |a||b| to get 



a.b 

lallbl 



=m$ + m+m} 



Then, from the definition of direction cosines, we recognize that this may be 
written 

a.b 

— — - = Uz + mm* + nw 2 , (4-38) 

l a ll b l 

where {l\, rm, m} are the direction cosines of a and {h, mi, n<i\ are the direc- 



148 / COMPLEX NUMBERS AND VECTORS CH 4 

tion cosines of b. If 6 is the angle of inclination between a and b then, by 
virtue of Eqn (4-36), expression (4-38) becomes 

a . b = |a||b| cos 6. (4-39) 

This may be taken as an alternative definition of the scalar product a . b. 

alternative definition 4-13 The scalar product of the two vectors 
a and b is written a . b and is defined to be the scalar quantity 

a . b = |a||b| cos 6, 

where 6 is the angle between the vectors. 

Notice that it is a direct consequence of the definition that the scalar 
product of two vectors is commutative. That is, we have a . b = b . a for any 
two vectors a and b. 

Because of this property we shall sometimes, and without confusion, 
write a 2 with the understanding that a 2 = a . a. In practice Definition 4-12 
is most used to find the scalar product since it relates the scalar product 
directly to the components of the vectors involved. The alternative form set 
out in Definition 4-13 is used to find the angle between the two vectors once 
the scalar product is known. 

Example 4-23 Find the scalar product of the vectors a = — 2i — 3j + k 
and b = — i + j + 3k and use the result to find the angle between a and b. 

Solution From Definition 4-12 we have a . b = (-2)(— 1) + (— 3)(1) 
+ (1)(3) = 2. Now |a| = V 14 and |b| = VI 1> so that substituting in 
Definition 4-13 we have 2 = y/14 . y/\ 1 cos 6 and hence cos 6 = 2/ VI 54, 
or 8 = arc cos (2/ VI 54). 

Consider the scalar products of the unit vectors i, j, and k. Since these are 
mutually orthogonal the angle between any two is n/2. It follows from 
Definition 4-13 that the scalar product of any two different unit vectors from 
this triad is zero. As each of the vectors i, j, and k is parallel to itself, when 
forming the scalar product of one of these vectors with itself we must set 
6=0. Thus as ]i| = |j| = jk| = 1, it follows from Definition 4-13 that 
i.i=j.j = k.k= 1. In summary we have these important results, which 
should be memorized since they are fundamental to everything that follows : 

i.i = j.j = k.k = 1, 

i . k = k . i = 0, 
j.k = k.j = 0. 

These results are conveniently combined in Table 4-2. Each entry is to be 



SEC 4-7 SCALAR AND VECTOR PRODUCTS / 149 

interpreted as the scalar product of the vector at the left of the row of the 
entry, with the vector at the top of the column of the entry. 

Table 4-2 Table of scalar products of i, j, and k 



First 
member 


Second member 
i j k 


i 

j 

k 


1 






1 
1 



The scalar product of two vectors may be deduced using Table 4-2 by 
simple algebraic manipulation without the use of Definition 4-12. To see this 
consider the vectors a = aii + fl2J + 03k and b = bii + foj + 63k. First 
form their scalar product 

a . b = (aii + a 2 j + a 3 k) . (bii + 62J + 63k), 

and then expand the right-hand side as though ordinary algebraic quantities 
were involved to obtain 

a . b = (aii) . (bii) + (aii) . (b 2 ]) + (aii) . (M) + (a 2 j) . (W) 

+ (fl2j) • (*2J) + («2J) . (MO + (ask) . (bii) + (ask) . (62J) 

+ (ask) . (Mi). 

Next, recognizing that the scalars ai, bi may be taken to the front of each 
scalar product involved, rewrite the result thus : 

a . b = aiM . i + aiM • j + aiM • k + a2^ij . i + 0262] • j 

+ azbz\ . k + asbik. . i + a3&2k . j + a3M< . k. 
Finally, using Table 4-2, this reduces to the desired result 
a . b = ai&i + ctzbz + 0363. 

In practice the intermediate working is always omitted and the result of a 
scalar product is written on sight by retaining only the products involving 
i . i, j . j, and k . k. 

Example 4-24 Determine the scalar products of these pairs of vectors : 

(a) a = i - 3j + k, b = -i + j - 3k; 

(b) a = 2i + j - k, b = -i + j - k; 

(c) a = 2i - j + 3k, b = -2i + j - 3k; 

(d) a = i + 2j - k, b = i + 2j - k. 



150 / COMPLEX NUMBERS AND VECTORS CH 4 

Solutions To show the application of scalar products of unit vectors we shall 
retain the notation i . i, j . j, and k . k in the first part of each calculation to 
indicate the origin of the terms involved. The terms involving products such 
as i . j, i . k, . . ., will be omitted as these scalar products are zero. The result 
will usually be written down on sight without any intermediate working. 

(a) a . b = (i - 3j + k) . (-i + j - 3k) 

= (l)(-l)i.i + (-3)(l)j.j + (l)(-3)k.k 
= -1 -3-3 = -7. 

(b) a . b = (2i + j - k) . (-i + j - k) 

= (2)(- l)i . i + (l)(l)j . j + (- 1)(- l)k . k 
= -2+1 + 1=0. 
Thus a and b are orthogonal. 

(c) a . b = (2i - j + 3k) . (-2i + j - 3k) 

= (2)(-2)i . i + (-l)(l)j . j + (3)(-3)k . k 
= -4- 1 -9= -14. 

(d) a . b = (i + 2j - k) . (i + 2j - k) 

= (l)(l)i.i + (2)(2)j.j + (-l)(-l)k.k 
= 1+4+1=6. 

Example (d) above is a special case of the scalar product of a vector with 
itself and either from Definition 4T2 or 4-13 we see that for an arbitrary 
vector a, 

a.a= |a| 2 . (4-40) 

In words, 'the scalar product of a vector with itself is equal to the square of 
the modulus of that vector'. This simple result is often valuable when finding 
a unit vector parallel to a given arbitrary vector a. To see how this comes 
about, if we divide a by its modulus |a| to form the vector & = a/|aj, then 
result (4-40) shows that 6fc . a = 1 and so a is a unit vector. 

Example 4-25 Find a unit vector a parallel to the vector a = 3i — j — 2k. 
Use the result to determine the projection of the vector b = 2i + 3j + k in 
the direction of a. 

Solution Here |a| = -y/14 so that the desired unit vector o = a/\/14 
= (3/V14)i — (l/\/14)j — (2/V14)k. Now the projection of vector b along 
a is by definition the length / of vector b when projected normally onto the 
line determined by a. Thus it is / = |b| cos 6, where 6 is the angle between 
b and a. Since |&| = 1 we may write this as / = |b| |o| cos 6 or, by Definition 
4-13, as / = b . a. Hence in this problem / = (2i + 3j + k) . a = 1/V14. 



SEC 4-7 SCALAR AND VECTOR PRODUCTS / 151 

It follows from the definition of a scalar product of two vectors and from 
the properties of real numbers, that if a, b, and c are three arbitrary vectors, 
then 

a . (b + c) = a . b + a . c. 

This is the distributive law for the scalar product of vectors. 

Expressions of the form a . b . c, a . b . c . d, . . ., are meaningless since 
the scalar product is only denned between a pair of vectors. Note also that 
division by vectors is not defined, since although we may write a . b = n, it 
makes no sense to write either a = nj. b or a . = n/b. 

The other form of product of two vectors is the vector product. We shall 
denote the vector product of vectors a and b by a x b. Again because of the 
notation this is often colloquially called the cross product of two vectors. 
Other notations in use for the vector product are [a, b] and a A b. In prepara- 
tion for the definition of a x b we now introduce a unit vector ft that is 
normal (i.e. orthogonal) to the plane defined by the vectors a and b, and 
whose sense is such that a, b, and ft, in this order, form a right-handed set of 
vectors. Here, although a, b, and ft are not necessarily mutually orthogonal, 
we use right-handedness exactly as was defined at the start of Section 4-6. 

definition 4-14 The vector product of vectors a and b will be written 
a x b and is defined to be the vector quantity 

a x b = |a||b| sin 6ft, 

where is the angle between vectors a and b with sin 6 > 0, and ft is a unit 
vector normal to the plane of a and b such that a, b, and ft, in this order, form 
a right-handed set of vectors. 

This shows that the vector a x b is normal to both a and b and has 
magnitude |a||b| sin 6. The first interesting and unusual feature of this form 
of product is that it is not commutative. If a, b, ft, in this order, form a right- 
handed set for the definition of a x b, then for the definition of b x a it is 
necessary to take for the right-handed set the vectors b, a, —ft, in the stated 
order. The immediate consequence is the important general result that if 
a and b are arbitrary vectors, then 

a x b = -(b x a). (4-41) 

In contrast with the scalar product, it is easily seen that the vector product 
of parallel vectors is identically zero, whereas the vector product of orthogonal 
vectors is non-zero. A simple calculation gives Table 4-3 of vector products 
of the unit vectors i, j, and k. The left-hand column identifies the first member 
of the vector product and the top row identifies the second member of the 
vector product. The corresponding entry in the table gives the result of the 



152 / COMPLEX NUMBERS AND VECTORS CH 4 

vector product. The entries along the diagonal are all seen to be the zero or 
null vector. 



Table 4-3 Table of vector products of i, j, and k 



First 
member 


Second member 
i j k 


i 

j 
k 


k -j 

-k i 

j -i 



If we take, for example, the first element in the left-hand column and the 
last element in the top row, we see that i x k = — j. In many respects it is 
easier to memorize these three results : 

i x j = k, j x k = i, k x i = j, (4-42) 

and then to use property (4-41), than to remember Table 4-3 complete. The 
order of the vectors occurring in these key relations can be remembered by 
making the cyclic permutations 

i j k 
j k i 
k i j 

As with scalar products, this table of vector products may be used to 
calculate the vector product of any two vectors expressed in component form. 
Consider the vector product a X b where a = aii + a^j + 03k and 
b = Z>ii + foj + fok. Proceeding as though ordinary algebraic quantities 
were involved we write 

a x b = (aii + a 2 j + 03k) X (Z>ii + b 2 ] + 63k) 

= (tfii) X (M) + (aii) X (6 2 j) + Oii) x (& 3 k) 

+ (a 2 j) X (bii) + (a 2 j) X (6 2 j) + («2J) X (63k) 
+ (ask) X (bii) + (a 3 k) x (6 2 j) + (a 3 k) X (63k), 

working on the assumption that vector multiplication is distributive over 
addition. Next we recognize that the scalars ai, bj may be taken out in front 
of each vector product that is involved so that the expression becomes 
a x b = aibii x i + ai& 2 i x j + aib^i x k + a2&ij x i + C2&2J x j 
+ 0263 j X k + 0361k x i + azbdi. x j + azb&. x k. 



SEC 4-7 



SCALAR AND VECTOR PRODUCTS / 153 



Finally, using Table 4-3 and collecting together the i, j, and k terms, we 
obtain 

a X b = (a 2 b 3 — a 3 b2)i + (a 3 bi — a\bi)\ + (aib 2 — a 2 6i)k. (4-43) 

This is often taken as the definition of the vector product a X b in place 
of our Definition 4-14. Expression (4-43) may be considerably simplified if 
the concept of a determinant is used. Before showing this we must digress 
slightly to define this term. 

definition 415 Let a, b, c, and d be any four real numbers. Consider 
the two-row by two-column array of these numbers 

a b 

(A) 
c d. 



Define the expression 

a b 

c d 
that is associated with this array by the identity 

t b 



d 



= (ad - cb). 



(B) 



(Q 



We define the second-order determinant associated with the array (A) to be 
the number represented in symbols by (B) and having the value defined by 
(C). The process of expressing the left-hand side of (G) in the form of the 
right-hand side is called expanding the determinant. 



Example 4-26 Evaluate the second-order determinants 



(a) 



(b) 



(c) 



Solution The values of the determinants follow directly from the definition : 



= (l)(9)-(3)(7) = 9-21 = -12; 



(a) 


1 


7 




3 


9 


(b) 





-1 




4 


2 


(c) 


2 


6 




1 


3 



= (0)(2)-(4)(-l) = + 4 = 4; 



= (2X3) - (1X6) = 6-6 = 0. 



154 / COMPLEX NUMBERS AND VECTORS 



CH 4 



definition 4-16 Let en, b u and a with /= 1, 2, 3 be any set of nine 
real numbers. Consider the three-row by three-column array of these numbers 



fli #2 as 
b\ bz b$ 

Cl C2 C3. 

Define the expression 

ai a% «3 

b\ bi b-& 

C\ C2 C3 



(A) 



(B) 



that is associated with this array to be the single number that is determined by 
the identity 



fli 02 a 3 
hi. bz £3 

Cl C2 C3 





b% bz 




bi 


bz 




bi 


hi 


= ai 




— Cl2 






+ a 3 








C2 c 3 




Cl 


cz 




Cl 


C2 



(Q 



We define the third-order determinant associated with the array (A) to be the 
number represented in symbols by (B) and having the value defined by (C). 

Example 4-27 Evaluate the third order determinant 



A = 





1 2 




2 2 




2 1 


= (3) 




-(-2) 




+ (-7) 






1 1 




2 1 




2 1 



Solution From the definition, 

3 -2 -7 

2 1 2 

2 1 1 

Expanding the three second-order determinants and adding, we obtain the 
desired result 

A = 3(1 - 2) + 2(2 - 4) - 7(2 - 2) 7. 

It is helpful to classify determinants in some simple way, which the 
next definition achieves. 

definition 417 We define the order of a determinant to be the number 
of terms that lie on a diagonal drawn from the top left-hand corner to the 



SEC 4-7 



SCALAR AND VECTOR PRODUCTS / 155 



bottom right-hand corner. The values of these terms are immaterial. 

Thus in Example 4-26 the determinants are second-order, whereas in 
Example 4-27 the determinant is third-order, and is evaluated in terms of 
three second-order determinants. 

We are now able to give the promised alternative definition of a vector 
product. 

alternative definition 4-18 We define the vector product a X b of 
the two vectors a = aii + a%\ + 03k and b = &ii + 62J + fok to be the 
formal expansion of the determinant 



a x b 



i 


J 


k 


fli 


ai 


03 


by 


b 2 


b 3 



In this definition we have used the word 'formal' because, although the at 
and bi are real numbers, the i, j, and k are unit vectors. Aside from this the 
expansion of the third-order determinant is performed exactly as in Example 
4-27. 

Example 4-28 Determine the vector product a x b where a = i + j — 2k 
and b = -2i + 3j + k. 

Solution To apply Definition 4-18 we first notice that the components 
fli, at, and 03 of a are 1, 1, and —2 whilst the components b±, b%, and 63 of b 
are —2, 3, and 1. Hence 





i j k 


a x b = 


1 1 -2 




-2 3 1 


and so 


a x b = 


= 7i + 3j + 5k. 



= 1 



-J 



1 


-2 


+ k 


1 1 


-2 


1 




-2 3 



This effectively demonstrates that for most practical purposes Definition 
4-18 involves the least manipulation. 

It is easily proved that the vector product is distributive, so that for any 
three vectors a, b, and c we always have 

ax(b + c) = axb + axc. 

Indeed this is implied by the way in which Eqn (4-43) was derived. 

With the introduction of the vector product, mixed products of the form 
a . (b x c) become possible. This type of product is known as a triple scalar 



156 / COMPLEX NUMBERS AND VECTORS 



CH 4 



product and as it involves the scalar product of a with (b x c) it is seen to be 
a scalar. If a = a x i + a 2 \ + a 3 k, b = bii + b 2 \ + b 3 k, and c = cii + c 2 \ 
+ c 3 k then by combination of Definitions 4-12 and 4-18 we have 



a . (b x c) = (flii + at\ + a 3 k) . 



i 

bi 

Cl 



j 


k 


b 2 


b 3 


C 2 


c 3 



or, 



a . (b X c) = ai(b 2 c 3 — c 2 b 3 ) — a 2 (bic 3 — cib 3 ) + ^3(^1^2 — cib 2 ). 

The terms on the right-hand side of this expression are the result of expanding 
(C) in Definition 4-16, so that they may be re-combined into a determinant to 
give the general result 



a . (b x c) = 



fll 


02 


a 3 


bi 


b 2 


b 3 


Cl 


c 2 


c 3 



(4-44) 



By interchanging rows of the determinant it is readily shown that the 
dot . and the cross x in a triple scalar product may be interchanged so that 



a . (b x c) = (a x b) . c. 



(4-45) 



Example 4-29 Evaluate the triple scalar product a . (b x c) given that 
a = 2i + k, b = i + j + 2k, and c = — i + j. 

Solution The components of a, b, and c are, respectively, (2, 0, 1), (1, 1, 2), 
and (—1, 1,0). Hence 



a . (b x c) 



2 





1 


1 


1 


2 


-1 


1 






= 2 . (-2) - . (2) + 1 . (2) 2. 



As our next generalization, we notice that vector products of more than 
two vectors are defined provided the order in which these products are to be 
carried out is specified by bracketing. As a special case we have the triple 
vector product a x (b X c) of the three vectors a, b, and c which differs 
from the triple vector product (a x b) x c. The first expression signifies the 
vector product of a and (b x c), whilst the second signifies the vector product 
of (a X b) and c, and in general these are different vectors. 

A straightforward application of Definition 4T8 establishes the following 
useful identity from which some interesting results may be derived 



a x (b x c) = (a . c)b — (a . b)c. 



(4-46) 



SEC 4 . 8 GEOMETRICAL APPLICATIONS / 157 

The details of the proof are left to the reader. 

Example 4-30 Demonstrate the difference between the triple vector products 
a X (b X c) and (a X b) x c by making the identifications a = i, b = i + j, 
c = k. 

Solution By direct substitution we find that a x (b x c) = i x f(i + j) X k] 
and so expanding this result by using Eqn (4-42) gives ax (b X c) 
= i x [— j + i] = — k. Similarly, in the second case, (a x b) x c = 
[i x (i + j)] x k = k x k = 0. 

4-8 Geometrical applications 

This section illustrates something of the application of vectors to elementary 
geometry, and gives some simple but useful results. First we consider the 
representation of a straight line in vector form, and then show how the single 
vector equation may be reduced to the more familiar set of three Cartesian 
equations. 

The straight line 

Consider the problem of determining the equation of a straight line given that 
it passes through the point A with position vector a relative to O, and is 
parallel to vector b. We shall denote the position vector of a general point P 
on the line by r as shown in Fig. 4-13. 





Fig. 4.13 Straight line through A parallel to b. 

By the rules of vector addition we have 

OP = OA + AP 

or, 

r = a + AP. 

However, as the straight line through A is parallel to the free vector b, 



158 / COMPLEX NUMBERS AND VECTORS CH 4 

it follows that for any point P on the line there is a scalar A such that we can 
write AP = Ab. Applying this result to the equation above we see that the 
vector equation for the straight line becomes 

r = a + Ab. (4.47) 

The scalar A in this equation is simply a parameter, and different values of 
A will determine different points on the line. To express this result in Cartesian 
form, set r = xi + y\ + zk, a = aii + a 2 j + 03k and b = bii + b 2 j + b 3 k, 
when Eqn (4-47) reduces to 

xi + y\ + zk = fl i J + fl 2J + a 3 k + A(Z>ii + b 2 \ + b 3 k). 

This vector equation implies three scalar equations by virtue of the equality 
of its i, j, and k components. Hence we arrive at the three scalar equations 

x = ai + Xbi (i-component) 
y = a 2 + A62 (j-component) 
z = a 3 + A63 (k-component). 

If these are each solved for A and equated, we obtain the more familiar result 

x — a\ y — ai z — a 3 

- y -^— = -^ = *- (4-48) 



bi bi b 

Equations (4-48) are the standard Cartesian form for the equations of a 
straight line. Notice that the coefficients of x, y, and z in Eqn (4-48) are all 
unity; that b\, b 2 , and b 3 are then the direction ratios of b and a\, a 2 , and a 3 
define a point on the line. Equations (4-48) are sometimes expressed in the 
form of three simultaneous equations relating x and y, x and z, and y and z. 
This follows by cross-multiplying different pairs of expressions in Eqn (4-48). 

Example 4-31 Find the vector equation of the line through the point with 
position vector i + 3j — k which is parallel to the vector 2i + 3j + 4k. 
Determine the point on the line corresponding to A = 2 in the resulting 
equation. Also express the vector equation of the line in standard Cartesian 
form. 

Solution From Eqn (4-47) we have 

r = (i + 3j - k) + A(2i + 3j + 4k) 
or, 

r = (1 + 2A)i + 3(1 + A)j + (4A - l)k. 

This is the vector equation of the line, and setting A = 2 determines the 
point r = 5i + 9j + 7k. To express the equation of the line in Cartesian 
form we appeal to Eqns (4-48) and use the fact that a = i + 3 j — k and 



SEC 4-8 GEOMETRICAL APPLICATIONS / 159 

b = 2i + 3j + 4k. Hence a\ = 1, a 2 = 3, az = —1, and b\ = 2, b% = 3, 
and bz = 4, so that the desired Cartesian equations are 

x — 1 J — 3 z+1 

As a check we can also use these equations to determine the point corres- 
ponding to X = 2. We must solve the three equations 

2 ' 3 ' 4 ' 

which give x = 5, y = 9, and z = 7. These are of course the coordinates of 
the tip of the position vector r = 5i + 9j + 7k which confirms our previous 
result. 

The same approach may be used if the line is required to pass through the 
two points A and B with position vectors a and (3, respectively. For then the 
line passes through a and is parallel to the vector [3 — a which is just a seg- 
ment of the line itself. Hence we identify a with a and b with (3 — a, after 
which the argument proceeds as before. 

In the next example we illustrate how the non-standard Cartesian equa- 
tions of a straight line may be re-interpreted in vector form. 

Example 4-32 The equations 

2x-l_j> + 2_-z + 4 
3 ~ 3 ~ 2 

determine a straight line. Express them in vector form and find the direction 
ratios of the line. 

Solution To express the equations in standard Cartesian form we must 
first make the coefficients of x, y, and z each equal to unity. Hence we rewrite 
the equations: 

x — | y + 2 z — 4 

~m = ~~3~ = F2) ' 

The vector a then has components «i = J, «2 = — 2, a 3 = 4 and the vector 
b has the components bi = 3/2, b 2 = 3, b 3 = —2. These latter three numbers 
are the desired direction ratios. The vector equation of the straight line itself is 

r = 1(1 + 3A)i + (31 - 2)j + 2(2 - A)k. 

(Why?) 

On occasion it is necessary to determine the perpendicular distance p 
from a point C with position vector c to the line L with equation r = a + Ab. 



160 / COMPLEX NUMBERS AND VECTORS 



CH 4 




Fig. 4-14 Perpendicular distance of point from line. 

This can be done by applying Pythagoras' theorem in Fig. 414. 
We have the obvious result 



n 2 — 



(AC) 2 - (AB) 2 



but AC = c - a so that (AC) 2 = |AC| 2 = (c - a) . (c - a), whilst length 
AB is the projection of AC onto the line L. Now the unit vector along L is 
b/|b| so that AB = (c -~a) . b/|b| and thus 

((c-a).b\ 2 



(AB) 2 = 



/ (c - a) . b y 

I ibi ; 



Combining these results gives 



i 2 = (c — 



(c - a) . (c 



( (e - a) . b \ 



(4-49) 



from which p may be deduced. 



Example 4-33 Find the distance of the point with position vector i + j + k 
from the line r = (i + 2j + k) + A(i - 2j + k). 

Solution In the notation leading to Eqn (4-49) we have a = i + 2j + k, 
b = i — 2j + k, and c = i + j + k. Hence c — a = — j and thus (c — a) 
. (c - a) = (-j) . (-j) = 1. Also (c - a) . b j . (i — 2j + k) = 2 so 

that ((c - a) . b) 2 = 4, whilst |b| 2 = 6. Hence 



/(c- a).b\ 2 _4_ 
I |bi J ~6~ 



SEC 4-8 GEOMETRICAL APPLICATIONS / 161 





Fig. 415 Vector equation of a plane n . r = \n\p. 

and so from Eqn (4-49), p 2 = 1 - § = i r p = 1/^3 as p is essentially 
positive. 

The plane 

The equation of a plane is easily determined once it is recognized that a 
plane II is specified when one point on it is known, together with any vector 
perpendicular to it. Such a vector, when normalized, is a unit-normal to the 
plane II and is unique except for its sign. The ambiguity as to the sign of the 
normal is, of course, because a plane has no preferred side. To derive its 
equation consider Fig. 4-15. 

Let r be the position vector relative to O of a point P on the plane II, and 
n be a vector normal to the plane directed through the plane away from O 
so that the corresponding unit normal is n = n/|n|. Further, let the perpendi- 
cular distance ON from the origin O to the plane be p. Then for all points P 
we have (OP) cos 6 = p. In terms of vectors this is 

r . n 

7nT =/7 ' (4-50) 

which is just the vector equation of a plane. If the number p in Eqn (4-50) is 
positive then the plane lies on the side of the origin towards which n is directed, 
otherwise it lies on the opposite side. 

To express result (4-50) in Cartesian form let r = xi + jj + zk and the 
unit normal n = n/|n| = /i + m \ + nk, where of course I 2 + m 2 + n 2 = 1. 
Equation (4-50) becomes 

lx + my + nz=p. ( 4 . 51 ) 



162 / COMPLEX NUMBERS AND VECTORS CH 4 

This is the standard Cartesian form of the equation of a plane. Any equation 
of this form represents a plane having for its unit normal the vector /i + m\ 
+ «k and lying at a perpendicular distance p from the origin. If p = the 
plane passes through the origin. 

Example 4-34 Find the Cartesian equation of the plane containing the point 
(1, 2, 3) which is normal to the vector i + 2j + 2k. 

Solution First we use Eqn (4-50) to determine p. Since the point (1, 2, 3) 
lies in the plane, r = i + 2j + 3k is the position vector of a point in the plane. 
The vector normal to the plane in this case is n = i + 2j + 2k, so that 
|n| = 3 and the unit normal ft = n/[n| = (i + 2j + 2k)/3. This shows that 
/ = J, m = f , n = f . Hence, substituting into Eqn (4-50), 

(i + 2j + 3k) . (i + 2j + 2k) 



or/? = 11/3. As p > 0, the plane must lie on the side of the origin towards 
which n is directed. Substituting in Eqn (4-51) we find the desired Cartesian 
form of the equation of the plane : 

3* + 3 y + 3 Z == ~3~- 

This equation could equally well be written in the non-standard Cartesian 
form x + 2y + 2z = 11, though then the constant on the right-hand side is 
no longer the perpendicular distance of the plane from the origin. 



Simple geometrical considerations similar to those set out above, when 
coupled with the scalar and vector product, enable various useful results to 
be derived very quickly. For example, as the angle 6 between two planes is 
defined to be the angle between their unit normals fii and &2 it follows that 
d may be obtained from the scalar product Ai . n2 = cos d. Also the line of 
intersection of these two planes is perpendicular to both normals hi and n 2 
and so is parallel to the vector t determined by the vector product t = fii x n2. 
Rather than elaborate on these ideas here, a number of problems are given 
at the end of the chapter. 

The sphere 

Consider a sphere of radius R with its centre at the point A with the position 
vector a. Then if r is the position vector of any point on the surface of the 
sphere, the modulus of the vector r — a must equal R. In terms of vectors the 
equation of the sphere is 

|r - a| = R 

or, alternatively, 



SEC 4-9 APPLICATIONS TO MECHANICS / 163 

(r - a) . (r - a) = R 2 . (4-52) 

If, now, we expand this equation to get 

r . r - 2r . a = i?2 - a . a, 

and then set r = xi + y\ + zk, a = aii + a 2 j + a 3 k and R 2 - a . a = q, 
we obtain the standard Cartesian form of the equation of a sphere 

x 2 + y 2 + z 2 - 2aix - 2a 2 y - 2a 3 z = q. (4-53) 

Example 4-35 Find the Cartesian form of equation of the sphere of radius 2 
having its centre at a = i + j + 2k. 

Solution As r = xi + y\ + zk and a = i + j + 2k we have r — a 
= (x- l)i + {y- l)j + (z - 2)k, whilst R = 2. Hence Eqn (4-52) becomes 

(x - 1)2 + (y - 1)2 + ( z _ 2)2 = 4, 
which is the desired Cartesian form of the equation. 

4-9 Applications to mechanics 

This section briefly introduces some of the many situations in mechanics 
that are best described vectorially. First is one of the simplest applications 
of vectors, that will already be familiar to the reader. 

Polygon of forces — resultant 

It is known from experiment that when forces Fi, F 2 , . . ., F„ act on a rigid 
body through a single point O, their combined effect is equivalent to that of a 
single force R, their resultant, which acts through the same point O and is 
equal to their vector sum. Such a system of forces acting through a single 
point is a concurrent system of forces. Thus we have 

R = F 2 + F 2 + ■ • • + F n . (4-54) 

These forces are often represented in the form of a vector polygon of 
forces as shown in Fig. 4-16, in which the senses of the forces F< are all simi- 
larly directed and are opposite to the sense of R. 

Conversely, the vector polygon shows that the vector -R is the additional 
force that is required to act through O in order to maintain the system oJ 
forces in equilibrium. 

Example4-36 Forces Fi,F 2 , and F 3 have magnitudes 3^/3, s/\ 4, and 2^6 lb 
and act concurrently through a point O along the lines of the vector 
i + j + k, 3i - j + 2k, and -i + 2j + k, respectively. Find force Q tha 
must act through O for the system to remain in equilibrium. 



164 / COMPLEX NUMBERS AND VECTORS 



CH 4 




Fig. 416 Vector polygon. 



Solution This is a direct application of the last remark about the vector 
polygon of forces, and the only problem is one of scaling. Let us agree that a 
vector of unit modulus represents a force of 1 lb. From the conditions of the 
question we see that Fi, F2, and F3 are respectively directed along the unit 
vectors 

fi = -= (i + j + k), 






1 



V14 
1 



(3i - j + 2k), 



V6 



(-i + 2j + k). 



Using the scale factor we can use these to write 

Fi = 3V3fi = 3i + 3j + 3k, 

F 2 = V14f 2 = 3i - j + 2k, 

F 3 = 2V6f 3 = -2i + 4j + 2k. 

Hence the resultant R = Fi + F2 + F 3 = 4i + 6j + 7k. The force necessary 
for equilibrium is Q = — R showing that Q = — 4i — 6j — 7k. 

As |Q| = V101> it follows immediately that the desired force is yT01 lbs 
and acts in the direction of the unit vector q, where 



■1 



Vioi 



(4i + 6j + 7k). 



In many problems of statics the centroid or the centre of mass of a system 
of particles is of importance. We now define this concept in terms of vectors. 



SE C 4-9 APPLICATIONS TO MECHANICS / 165 

definition 4-19 The centre of mass of the system of masses m\, mo, ■ . ., 
m n whose position vectors are ai, a2, . . ., a„ is at the point G. where G 
has the position vector g determined by 

wiai + mono + • • • + /««a„ 

g = -. 

nix + mo + ■ ■ • + m„ 

Next we discuss simple problems about relative motions, and relative 
velocity. 

Relative velocity 

Problems involving the motion of one point relative to another, which is 
itself moving, occur frequently in mechanics and easily lend themselves to 
vector treatment. They are best illustrated by example but first we define 
relative velocity. 

definition 4-20 The relative velocity of a point P with velocity u, relative 
to the point Q with velocity v, is defined to be the velocity u — v. 

Example 4-37 A man walks due east at 4 mile/h and his dog runs north- 
east at 12 mile/h. Find the velocity and speed of the man relative to his dog. 

Solution Let a unit vector denote a velocity of magnitude 1 mile/h and take 
j pointing due north and i pointing due east. 

Unit vectors in the directions of motion of the man and dog are then i and 
+ })IV 2 - The velocity u of the man is thus u = 4i and the velocity v of the 
dog is v = 6V2(i + j). Hence the velocity of the man relative to his dog is 

u - v = 2(2 - 3^2)1 - 6V2j. 

His relative speed is |u - v| = (160 - 48V2) 1/2 mile/h. 

Work done by a force 

The scalar product can be used to give a convenient representation of the 
work W done by a force F that produces a displacement d of the particle on 
which it acts. The work done by a force of magnitude |F| when it displaces a 
particle through a distance ]dj is defined as the product of the distance 
moved and the component of force in the direction of the displacement. 
Hence, as W is positive we have 

W= |F||d||cos0|, 
where 6 is the angle of inclination between F and d. So the final result is: 

W=|F.d|.. (4 .55) 

Example 4-38 Calculate the work W done by a force F of 12 lbs whose line 



166 / COMPLEX NUMBERS AND VECTORS CH 4 

of action is parallel to 2i + 3j — 2k when it moves its point of application 
through a displacement d of 4 ft in a direction parallel to — 2i + j — 3k. 

Solution The unit vectors parallel to the force F and displacement d are 
f = (2i + 3j - 2k)/v 17 and d = (-2i + j - 3k)/ v 14, respectively. Let f 
denote a force of 1 lb and d a displacement of I ft so that F = 12f 
= (24i + 36j - 24k)/ v 17 and d = 4d = (-8i + 4j - 12k)/y 14. Then 
the work W that is done is 

W = |F.d| ft lbs 

= (24)(-8) + (36)(4) + (-24K- 12) = 240 ft lbs. 

We now turn to applications of the vector product. One of the easiest 
occurs in the determination of the angular velocity of a point rotating about a 
fixed axis. 

Angular velocity 

Consider a rigid body rotating with a constant spin SI rad/s about a fixed 
axis L. Fig. 4- 1 7 represents a point P in such a body, having the position vector 
d relative to a point O on the spin axis L. Point Q is the foot of the perpendi- 
cular from P to the line L. 

The vector SI parallel to L with magnitude U. and sense determined by a 
right-hand screw rule with respect to L and the direction of the spin O is 
called the angular velocity of the body. The instantaneous linear velocity v 
of point P with position vector d is obviously Q. . (QP) in a direction tangent 
to the dotted circle in Fig. 4T7. It is easily seen that we may rewrite this as 

|v| = |£2||d| sin© 

or as 

v = SI x d. (4-56) 

The final two applications of the vector product involve the concept of the 
moment of a vector which is first defined and they require the use of a bound 
vector. 

definition 4-21 We define M = d x Q to be the moment of vector Q 
about the point O, where d is the position vector relative to O of any point on 
the line of action of the bound vector Q. 

This definition is illustrated in Fig. 4T8 in which the plane IT contains the 
vectors d and Q and, by virtue of the definition of the moment, M is normal 
toll. 

The natural mechanical applications of this definition are to the moment 
of a force and to the moment of momentum about a fixed point. In both 



PROBLEMS / 167 




M = dxQ 




Fig. 4-17 Angular velocity. 



Fig. 418 Moment of a vector about O. 



situations the line of action of the vector whose moment is to be found is 
important, as is its point of application in some circumstances. 
If Q is identified with a force F, then the expression 



M = d X F 



(4-57) 



is the moment or torque of the force F about O. If the force is expressed in 
lb and the displacement vector in ft, the units of torque are lb-ft. Similarly, 
if Q is identified with the momentum mv of a particle of mass m moving with 
velocity v, then the vector 



M = d x (mv) 
= m& X v 



(4-58) 



is the moment of momentum or the angular momentum of the particle about O. 

PROBLEMS 

Section 4-1 

4-1 Give a graphical representation of each of the following velocities by drawing 
directed line elements. In each case indicate the sense of the vector with an 



(a) 4 ft/s in a north-east direction; 

(b) 2-5 ft/s in a south-west direction; 

(c) 5 ft/s due west. 

What velocities would these same directed line elements represent if the 
arrows were reversed ? 

4-2 Classify each of these quantities as scalar or vector: 

(a) volume; (b) length; (c) momentum = mass x velocity; (d) electric 
field; (e) speed; (f) acceleration; (g) density; (h) chemical concentration; 
(i) electrostatic capacity; (j) moment of a force. 



168 / COMPLEX NUMBERS AND VECTORS CH 4 

4-3 Find the roots of each equation : 

(a) x 2 = -36; (b) x 2 = -27; (c) x 2 = 25; (d) x 2 = -2. 

4-4 Find the roots of these quadratic equations: 

(a) x 2 + 3x + 3 = 0; (b) x 2 - 3x + 2 = 0; (c) x 2 + 4x + 5 = 0. 

4-5 By setting x 2 = w, reduce the following quartic equations to quadratic 
equations, and hence obtain their roots: 
(a) x* + x 2 - 2 = 0; (b) x 4 + 5x 2 + 6 = 0; (c) x 4 - 5x 2 + 6 = 0. 

4-6 Find the real and imaginary parts of each of these complex numbers: 

(a)z = 9-6/; (b) z = 32; (c) z = 14 + 2/; (d) z = 17/; (e) z = 

-3 + /. 
4-7 Write the following numbers in real-imaginary form given that their real 

and imaginary parts are: 

(a) Rez = -11, Imz = 1; (b) Re z = 0, Im z = -3; 

(c) Re z = 0, Im z = 0; (d) Re z = 4, Im z = 17. 

Section 4-2 

4-8 Which of these complex numbers are equal ? 

zi = 2 — /, z 2 = 1 — /, z 3 = 4 + ;', z 4 = 1 — ;', Z5 = 2 + /, Z6 = 2 — /, 

Z7 = 1 — /'. 

4-9 Given that the following complex numbers are equal, deduce the values of 
a and b : 

(a) 2 - 3/ = 2 + ib; (b) a + 4/ = 1 + ib; 

(c) 3 + 7/ = a + ib; (d) 5 + ia = b + 6/. 

4- 10 Use Definitions 4-2 and 4-3 together with the real number axioms to prove 
that (a) zi + Z2 = Z2 + z\ thereby showing that complex addition is com- 
mutative and, (b) z\ — Z2 = — (z2 — zi). 

411 Form the sums z\ + zz given that: 

(a) zi = 3 — /', Z2 = 4 + li\ 

(b) zi = -2 - 4/, z 2 = 2 + 3/'; 

(c) zi = 5 + 6/', Z2 = —5 — 6/'; 

(d) zi = 4 - 3/, z 2 = 2 + 3/. 

4-12 Form the differences z\ — Z2 given that: 

(a) zi = 2 + 6/, z 2 = 4 + 2i; 

(b) zi = -2 + /, z 2 = -2 + 2/; 

(c) zi = 4 + li, z 2 = 2 + 7/; 

(d) z\ = 3/, z 2 = 1 + 3/. 

4- 13 Form the products zizo given that: 

(a) zi = 1 + /, z 2 = 2 + 3/; 

(b) zi = 3 - 5;, z 2 = 3 + 5/'; 

(c) 2i = /', Z2 = 4 — 3/; 

(d) 2i = 2, z 2 = 9 - i. 

4-14 Evaluate (1 + 5 - (1 - if- 
1 1 



4-15 Evaluate 



(1 + i)« T (1 - ;) 4 



PROBLEMS / 169 

4- 16 Solve these equations for z: 

(a) 3z + (9 + 6/) = 7 + 3/; (b) 2z + (3 - 2i) = 3 - 2»; 

(c) 4z - (4 + 6/) = -3 + i; (d) 3z + (2 + /) - 3(1 + 2/) = 1 + /. 

4-17 Form the quotients zi/z2 given that: 

(a) zi = 3 + 2/, z 2 = 1 - /; (b) zi = 9 + 3/, z 2 = 3 + /; 

(c) zi = 8 + 4/', z 2 = 2 - 4;'. 

4-18 Solve these equations for z: 

(a) 2z(3 + = 2 + 3/; (b) 3z(l - 2/) = 1 + 4/; 

(c) 4z(l -0 = l + j; (d) 2z(4 + 0=1+ 4». 

419 Use Definition 4-4 and the real number axioms to prove that zizo = z«z\ 
thereby showing that complex multiplication is commutative. 

4-20 Use Definition 4-5 to prove that: 

<a)*-<7>; (b) (i)=# 

(c) £5) = (z)3; (d) (~ j = z lf2 . 

4-21 Use the real number axioms together with Definition 4-5 to prove that: 

(Zl + Z2 + • • ■ + Zn) = Zl + Z 2 + • • • + Z„. 

4-22 State which of the following polynomials have at least one real root and 
which, if they have complex roots, will have them occur in complex conjugate 
pairs. If no deductions can be made about the nature of the roots, then say so. 

(a) P(z) = z 5 + 16z 4 + z 2 + 3z + 1 ; 

(b) P(z) = z 4 + 3z 3 + 2z 2 + 1 ; 

(c) P(z) = z 7 + 5z 5 - 2z 2 + z + /; 

(d) P(z) = z 3 - 6z 2 + 2z + 4. 

4-23 Given that z = 2 + 3/ is a root of the polynomial 
P(z) = z 4 - 4z 3 + 12z 2 + 4z - 13, 

deduce the values of the other three roots. Factorize P{z) into linear and 
quadratic factors with real coefficients. 

4-24 Given that z = i is a root of the polynomial 

P(z) s- z 5 - 2z 4 + 10z 3 - 20z 2 + 9z - 18, 

deduce the values of the other four roots. Factorize P(z) into linear and 
quadratic factors with real coefficients. 



Section 4-3 

4-25 Plot the following vectors z x and z 2 in the complex-plane and use geometrical 
methods to form their sum zi + z 2 and their difference zi - z 2 : 
(a) zi = 2 + 3/, z 2 = -1 + 2/; (b) zi = 3, z 2 = 4 - /• 

(c) zi = 4/, z 2 = 3 - 4/; (d) zi = -1 - 2/, z 2 = -1 + 2/. 

4-26 Find the modulus of each of these vectors: 

(a) 4 - 3/; (b) -2 + 3/; (c) 2 - 3»; (d) 3 + 4/; (e) 5i. 



170 / COMPLEX NUMBERS AND VECTORS CH 4 

4-27 Use Definitions 4-5 and 4-7 to prove that: 

(a) zz = | z I 2 ; (b) ( zi z 2 1 = | zi | . | z 2 1 ; 
and give an inductive proof that 

| Zl Z2 ■ ■ ■ Z„ | = | Zl | . | ZZ | • • • | Z„ | . 

4-28 Given that zi = 3 + 4/, Z2 = 4 — 3i, Z3 = 2 + /, and Z4 = V3 + /, use the 
results of the previous problem to compute | z\ z% |, | zi Z2 Z3 |, and 
| zi Z2 zz Zi | . Check your results by direct computation and compare the 
relative labour of computation. 

4-29 Use the properties of the complex conjugate operation to prove that for any 
two complex numbers zi and Z2, 

zi Z2 + zi Z2 = 2 Re zi £2. 
Then, using this result together with the obvious inequality 

| Re zi Z2 | < | zi z 2 | 
and the identity 

| Zl + Z2 | 2 = (Zl + Z2)(Z1 + Zz), 

prove the triangle inequality, 

| Zl + Z2 | < | Zl | + | Z2 |. 

4-30 Use the same form of argument as in Problem 4-29 together with the obvious 
inequality Re zi Z2 > — | zi Z2 | to prove 

||zi| - |z 2 || <|zi + z 2 |. 

4-31 Give two examples in which the triangle inequality is strict (that is, the sign 
< is replaced by <). Give two further examples in which it reduces to an 
equality. 

4-32 Give two examples in which the inequality 1 1 zi | — | Z2 1 1 < | zi + Z2 | is 
strict. Give two further examples in which it reduces to an equality. 

Section 4-4 

4-33 Express these numbers in modulus-argument form: 

(a) z = -3 + 4/; (b) z = -3 - 4;; 

(c) z = -3 + 3/; (d) z = 2V3 - 2i. 

4-34 Express the following numbers z in real-imaginary form given that : 

(a) | z | = 4, arg z = |; (b) | z | = 2, arg z = ^; 

(c) | z | = 6, arg z = y ; (d) | z | = 3, arg z = ~ 

4-35 Use the modulus-argument representation of complex numbers to prove : 

(a) Zl Z2 • • • Zn = Zl . Z2 • • • Zn\ (b) (z") = (f)"; 



(c) 



and arg |-| = —arg; 



PROBLEMS / 171 

4-36 Given the following numbers z in real-imaginary form, compute the products 
iz. Plot the results in the complex-plane and verify that the effect of multi- 
plication by / is to rotate a vector anti-clockwise through an angle I * without 
change of size: 

z = 3 — 2/; z = —2 + i; z = /; z = — 1 — /. 

4-37 Form the products z\ zo and the quotients z\jzo of the following numbers 
expressed in modulus-argument form: 

(a) zi = 3(cos ,'jjr + /sin \tt)\ z 2 = |(cos }» + /sin \tt); 

(b) z\ — 4(cos \-n — /sin \-t)\ zi = 2(cos i^ + /sin Jtt); 

(c) zi = iHcos iw — / sin iw); z-i = 6(cos 3w/2 — / sin 3-n-/2). 

438 The second-order difference equation 

ailn + bun-l + CUn-2 = 

has for its general solution the expression 

u ,, = Ah" + Bh" 
whenever the characteristic equation 

aX l + bl + c = 

has the distinct real roots h and A2. If b 2 — Aac < 0, so that the character- 
istic equation has the complex conjugate roots A and A, show that if u n is to 
be real, then the constants A and B must also be. complex conjugates. Hence 
show that if/) 2 — Aac < and | I \ — r , arg I = 0, then the general solution 
is expressible in the form 

Un = r"(Ccos nO + D sin nO), 

where C and D are real arbitrary constants. 

Find the general solution of the following difference equation, and hence 
determine the particular solution appropriate to the stated initial conditions: 

ti„ — 3\ 2u„-\ + 9«jj-2 = with un = 1, wi = 3. 

Section 4-5 

4-39 Use de Moivre's theorem to express sin 16 and cos 16 in terms of powers of 
sin and cos 0. 

4-40 Use de Moivre"s theorem to express sin 1 1 and cos 1 1 in terms of powers 
of sin 6 and cos 6. 

4-41 Evaluate z 20 when z = -\ '3 + /. 
4-42 Evaluate z u when z = I — i \ 3. 
4-43 Calculate the seventh roots of unity. 
444 Find the roots of the equation iv = (— /) 2 < 3 . 
4-45 Find the roots of the equation w = (1 + i\ 3) ,/4 . 
Section 4-6 

4-46 Construct the set of cyclic permutations of the four letters a, b, c, and d. 
4-47 Construct a table analogous to Table 41 for a left-handed system of axes. 

4-48 Determine the lengths j OP | of the vectors OP given that O is the origin 
and the points P are: 



172 / COMPLEX NUMBERS AND VECTORS CH 4 

(a) (1,1,1); (b) (-2,1,3); (c) (-1, -1, -1); (d) (3, -2, -4). 

449 Find the lengths | OP |, the direction cosines and the angles 0i, 02, 03 of the 
vectors OP, where the points P are: 
(a) (2, - 1, - 1); (b) (4, 0, 2); (c) (-1,2, 1). 

4-50 Find the direction ratios, the direction cosines and the angles 0i, 2 , 63 of 
the vectors OP, where the points P are: 

(a) (1, 1, l)T(b) (-1,1,1); (c) (2,1, -1). 
4-51 Determine the angles 0i, 02, 03 for the vectors with the direction cosines: 

452 Given that a vector makes acute angles with each of the coordinate axes and 

that its direction cosines are Im, m, —pi\, deduce the value of m and hence 
find the angles. \ ^ 1 

4-53 Use the fact that a vector makes an acute angle with each of the coordinate 
axes and that its direction ratios are 1, 2, 2 to determine the angles 0i, 02, 
and 03 that it makes with the coordinate axes. 

454 Determine the lengths | AB | of the vectors AB, given that the end points 

A and B are: 

(a) A = (1, 1, 1), B = (2,0, 1); 

(b) A = (2, -1,1), B = (-2, 2,2); 

(c) A = (-1,3, 1), B = (-2, -1,0). 

Use your results to determine the direction cosines for each of these vectors. 

4-55 Write down the position vectors OP in terms of the unit vectors i, j, k given 
that O is the origin and the points P are: 
(a) (1,1,1); (b) (-2,3,7); (c) (3, -1, 1 1); (d) (0,1,0). 

4-56 Write down the x, y, and z-components of these vectors: 

(a)3i-2j + k; (b) -i + 3j + Ilk; (c) i - k; (d) j + 3k. 

4-57 Form the vector a = ai + /?j + 7k, given that 
(1 - a)i + 2/5j + (2y - l)k = 2i + j + 3k. 

4-58 Determine the values of ot, ft, and y in order that: 

(1 - a)i + W ~ * 2 )j + (3' ~ 2)k = li + 3j + 2k. 

459 Form the sum a + b and difference a — b of the vectors: 

(a) a = 3i - 2j + k, b = -i - 2j + 3k; 

(b) a = -i + 2j - k, b = 2i - 4j + 2k; 

(c) a = 2j - 3k, b = 2i - j + k. 

4-60 Prove from the definitions of addition and subtraction of vectors that for 
any vectors a and b 
(a) a + b = b + a and (b) a - b = -(b - a). 

4-61 Find |AB| and the direction cosines of the vectors AB given that A and B 
are the points: 



PROBLEMS / 173 



(a) A = (1, 1, 1), B = (2, -1, 1); 

(b) A = (2,0,-1), B = (1,2,1); 

(c) A = (1, 1, 1), B = (-1,-1,-1). 

4-62 State which of the following pairs of vectors a and b are parallel and which 
are anti-parallel : 

(a) a = i - 3j + k, b = -4i + 12j - 4k; 

(b) a = -2i + 3j - k, b = 2i - 3j + k; 

(c) a = 4i - j - 3k, b = 8i - 2j - 6k; 

(d) a = i + 7j + k, b = 3i + 21j + 3k. 

Section 4-7 

4-63 Express the following vectors a as the product of a scalar and a unit vector: 

(a) a = 2i - j + 3k; (b) a = 3i - 3j + k; (c) a = -^-i + -j - ^k. 

4-64 Find the vectors AB, and their direction cosines given that A and B have 
position vectors a and b, respectively, where 

(a) a = 3i - 3j + 5k, b = i + 2j - k; 

(b) a = 2i + 2j + k, b = i + 3j + 2k. 

4-65 Verify the inequalities 1 1 a | - | b 1 1 < | a + b | < | a | + | b | for the pairs 
of vectors : 

(a) a = i - 2j - k, b = 2i - 3j + k; 

(b) a = 3i - 4j + k, b = 6i - 8j + 3k; 

(c) a = 2i + 3j - k, b = -6i - 9j + 3k. 

4-66 Find the angle between the vectors a and b where: 

(a) a = i + j + k, b = 2i + j - k; 

(b) a = -i + 2j + 2k, b = 2i — j — 2k. 

4-67 Give two examples of pairs of vectors that are orthogonal but are not parallel 
to the vectors i, j, or k. 

4-68 Give two different proofs of the fact that scalar multiplication of vectors is 
commutative by using the two alternative definitions of the scalar product. 

4-69 Find the scalar products a . b and hence find the angle between the vectors 
a and b given that : 

(a) a = 7i - 3j + k, b = -i + 2j + 2k; 

(b) a = 2i - 2j + k, b = — 3i — 3j + 4k; 

(c) a = i + 2j + 3k, b = -2i - 4j - 6k. 

4-70 Find unit vectors parallel to the vectors a where : 

(a) a = 2i - 2j + k; (b) a = -3i + j + 2k; (c) a = 7i - 2j - 3k. 

4-71 Prove the distributive law for the scalar product by using either definition of 
the scalar product. 

4'72 Form the vector products a x b if: 

(a) a = i - 2j - 4k, b = 2i - 2j + 3k; 

(b) a = -i + 4j - k, b = 3i + 2j + 4k; 

(c) a = -2i + 4k, b = 3j - 2k. 



174 / COMPLEX NUMBERS AND VECTORS 



CH 4 



4-73 Evaluate the determinants: 



(a) 



2 1 


(b) 


4 16 


(c) 


2 


(d) 


3 9 


4 6 


' 


-2 6 


' 


16 


* 


1 3 



4-74 For what values of A, if any, do these determinants vanish: 



(a) 



4-75 Evaluate the determinants: 



A 2 


(b) 


A 2 


(0 


3 A 


(d) 


3 1 




3 2A 


' 


2 


' 



(a) 



2 1 1 
1 2 1 
1 1 1 



(b) 



3 4 5 


(c) 


2 2 1 


» 


1 2 





3 4 5 
3 1 2 
6 5 7 



3A 4 
2 -A 



4-76 For what values of A do the following determinants vanish : 
(a) 



A 1 2 


(b) 


1 A 1 


; 


2 2 1 





A 


1 


ic) 


2A 


1 


; 


1 


3 








1 


2 


A 


1 





1 


A 


1 



4-77 Use Definition 418 to prove that a x b = — (b x a) for arbitrary vectors 
a and b. 

4-78 Evaluate the vector products b x a given that : 

(a) a = 2i - j + 2k, b = -3i + 2j + k; 

(b) a = -i + j + k, b = 4i + 2j + 3k; 

(c) a = -i - j - k, b = 2i + 2j + 2k. 

479 Determine unit vectors that are normal to both vectors a and b when : 

(a) a = 3i + 5j - 2k, b = i + j + k; 

(b) a = -4i + 2k, b = j - 3k. 

State whether the results are unique and, if not, in what way are they in- 
determinate. 

4-80 Use the definition of a vector product to prove that it is distributive and so 
ax(b + c) = axb + axc. 



4-81 Use Definition 418 of a vector product to prove that when a and b are non- 
zero vectors, then a x b = if, and only if, a and b are parallel. 

4-82 Use Definition 418 to evaluate the vector products a x b given that: 

(a) a = -i + 4j - 2k, b = 2i + 3j + k; 

(b) a = -2i - 3j + k, b = 6i + 9j - 3k; 

(c) a = 3i - k, b = 2j. 

Evaluate these same vector products using Table 4-3 and compare the effort 
involved. 

4-83 Verify the distributive property of the vector product: 
ax(b + c) = axb+axc, 
given that a = 2i + j — k, b = i — z) + k and c = 3i — 2j + 3k. 



PROBLEMS / 175 

4-84 Evaluate the triple scalar products a . (b x c) and (b x a) . c given that: 

(a) a = 2i - j - 3k, b = 3k, c = i + 2j + 2k; 

(b) a = i + 2j + k, b = 2i + j + k, c = 4i + 2j + 2k. 

4-85 Prove that if a, b, and c form three edges of a parallelepiped all meeting at a 
common point, then the volume of this solid figure is given by | a (b x c) I 
Deduce that the vanishing of the triple scalar product implies that the vectors 
a, b, and c are co-planar (that is, all lie in a common plane). 

4-86 Determine the vector products a x (b x c) given that : 

(a) a = 2i - j - 3k, b = 3i + j + k, c = -i + j + k; 

(b) a = -i + j - k, b = 2i - 2j + 2k, c = i + k. 

4-87 Prove that (a x b) x c = (a . c)b - (b . c)a. 



Section 4-8 

4-88 Find the vector equation of the line through the point with position vector 
2i- j- 3k wh.ch , s parallel to the vector i + j + k. Determine the points 
corresponding to X = - 3, 0, 2 in the resulting equation. P 

489 v^ctortT-ll ? i at 'T ° f ^k 116 throUgh the P° intS A and B with Potion 
cSnS of rnis'lmV " " "* " = "' + j + * ° et ™ the *«*<» 

4-90 The equations 

3 *+ 3 -2y + 1 _ 2z + 6 
2 7 ~ 

SSefof rhe £? ^ **"" ta ' m ™** f ° m and find *e direction 

4-91 If the points A and B have position vectors a and b, and point C divides the 
line AB in the rat.o X : M , show that C has the position vector 
,ua + Ab 
A+ fl - provided A + /t ^ o. 

4M S£in he V6 f ° r eqUati ,° n ° f thC Hne that P asses throu g h th « point A with 

rhere b n : e r: r 2j T 7k 2, an"d J c + = k _Tti ^ " ^ ** ^ " "^ 

4-93 Find ^he perpendicular^ distance of the point 2i + j + k from the „„e 

4-94 Find the perpendicular distance of the point i + 3j + 2k from the line 
2x - 1 y + 2 z- 1 

2 3 -~r~ 

4 ' 95 a^no'rmairf+jTk 11011 ° f "" *"" ^^ «* ^ * " J + 2k 

4-96 Find the Cartesian equation of the plane containing the point 3i - k and 
alw contauung the two vectors a, £ where a = i S + 2j I k and B = -1 



176 / COMPLEX NUMBERS AND VECTORS CH 4 

4-97 Find the angle between the two planes 

jc— l_y + 2_z— 3 

2 r~ ~ 3 

and 

2x-l _ y-l _ 2z + 1 

2 ~ 3 ~ 3 

4-98 Find the angle between the plane z = 2 and the plane 

x + 2 y - 1 z+ 1 



4-99 Let H be the plane r . n = p, where fi is the unit normal to II and p is its 
perpendicular distance from the origin. By constructing a plane 11' parallel 
to 11 through point P with position vector a, show that the perpendicular 
distance of P from 1 1 is given by the expression | o . n — p \ . What form 
would this expression take if the plane was expressed in the form r . n = q, 
where | n | # 1. 

4100 A line may be uniquely determined as the intersection of two planes 

r . in = pi and r . 112 = pz (A) 

where ni and no are not necessarily unit vectors. The direction of the line is 
normal to both ni and nz and so is parallel to ni x 112. Hence the line has the 
equation r = a + A(m x n2) where A is a parameter and a is some point 
common to the two planes in (A). Apply these arguments to obtain the vector 
equation of the line determined by the planes 

x + 2v — z = 3 and 2x + y + 2z = 1. 

4-101 Find the Cartesian equation of the sphere of radius 3 about the centre 
a = 2i + 3j + k. 

4-102 Construct the Cartesian equation of the sphere of radius 4 that lies on the 
side z > of the plane z = and is tangent to the point (3, 1, 0). 

4103 The inward drawn normal to a sphere of radius 2 at the point (1, 1, 2) on its 
surface is 11 = 2i — j + k. Deduce its equation in Cartesian form. 



Section 4-9 

4-104 Forces Fi, F2, F3, and F4 have magnitudes 2\ 6, 3V5, 3, and 15 lb and act 
concurrently through a point O along the lines of the vectors — i + 2j — k, 
2i + k, 2j, and 4i + 3j, respectively. Find the resultant of these forces and 
determine its magnitude in lb. 

4- 105 Forces 1, 2, and 3 act at one corner of a cube along the diagonals of the faces 
meeting at that corner. Find the magnitude of their resultant, and its inclina- 
tion to the edges of the cube. 

4106 A sphere of 10-in radius and mass 20 lb has one end of a string 18 in long 
attached to its surface and hangs at rest against a smooth vertical wall to 
which the other end of the string is attached. The string has a tension Tand 
the wall exerts a normal reaction R at its point of contact with the sphere. 
Use a vector triangle of forces to determine T and R. 



PROBLEMS / 177 



4107 Deduce that for three concurrent forces Fi, F2, and F3 to be in equilibrium 
they must form a closed vector triangle of forces, and hence be coplanar. 
Use your result to prove Lami's theorem, which asserts that when three con- 
current forces are in equilibrium, the magnitude of each force is proportional 
to the sine of the angle opposite to it in the vector triangle of forces. 

4-108 Find the centre of mass of the masses 1, 3, 4, and 2 lb situated at points with 
the respective position vectors 3i — j + k, 2i + 2j + 2k, — i + 7j — k, and 
4i - 10k. 

4- 109 Prove that the centre of mass of a system of masses is independent of the 
choice of origin. 

(Hint : Choose a new origin O' with position vector b relative to the original 
origin O and apply the definition of centre of mass.) 

4-110 The velocity of a boat relative to the water is represented by 4i + 3j, and that 
of the water relative to the earth by 2i — j. What is the velocity of the boat 
relative to the earth if i and j represent velocities of 1 mile/h to the east and 
north, respectively? 

4111 The point of application of the force 9i + 6j + 7k moves a distance 5 ft in 
the direction of the vector 3i + j + 4k. If the modulus of the force vector is 
equal to the magnitude of the force in lb, find the work done. 

4- 112 A body spins about a line through the origin parallel to the vector 2i — j + k 
at 1 5 rad/s. Find the angular velocity vector Si for the body and find the 
instantaneous linear velocity of a point in the body with position vector 
i + 2j + 3k. 

4- 113 Find the torque of a force represented by 3i + 6j + k about point O given 
that it acts through the point with position vector — i + j + 2k relative to O. 

4-114 Masses 1, 3, and 2 units at the points specified by the position vectors 3i — k, 
2i — 3j + k, and i + j 4- k relative to point O have velocities represented by 
2j + k, 3i + j + 2k, and i — j + k, respectively. Determine the vector sum 
of the moments of momentum of each of these masses about O. 



Differentiation of functions 
of one or more real 
variables 



5-1 The derivative 

The important branch of mathematics known as the calculus is concerned 
with two basic operations called differentiation and integration. These 
operations are related and both rely for their definition on the use of limits. 

The calculus was founded jointly, and independently, by Newton in 
England, and by his contemporary Leibnitz in Germany to whom we owe 
the essentials of our present day notation. In introducing the ideas underlying 
a derivative we shall make use of a simple dynamical problem in very much 
the same way that Newton did when first formulating his early ideas on 
differentiation. However we have the advantage of understanding the nature 
of a limit more clearly than was the case in his day, so that after presenting 
our heuristic argument, we shall quickly formalize it in terms of the ideas set 
down in Chapter 3. 

We shall consider how to define and determine the instantaneous speed 
of a point P moving in a non-uniform manner along a straight line. To be 
precise, we shall suppose that a fixed point O on the line has been selected, 
and that the distance s of point P from O at time t is determined by the 
equation 

where f(t) is some suitable continuous function of t defined on some interval 
J . Thus we know the position of P at a general time t, and are required to 
use this information to define and find the speed of P at any given instant of 
time. When the motion of P is uniform, so that its displacement is proportional 
to the elapsed time, the familiar definition of speed as distance per unit time 
can be used. However if the motion is non-uniform we must consider the 
situation more carefully. We shall use intuition here and first consider the 
difference quotient 

f(t 2 ) -f(h) 

H — t\ 

in which t\ and t% are two different times belonging to J . 

It seems reasonable to suppose that if H were to be taken sufficiently 
close to t\ then expression (5-1), which is the quotient of the finite distance 
travelled and the elapsed time, would in some sense provide a measure of the 



SEC 5-1 THE DERIVATIVE / 179 

average speed of P in the small time interval H — t\. Even better would be the 
idea that we compute the difference quotient (5-1) not for one time t% close 
to t\, but for a monotonic sequence {n} of times having for its limit the time 
ti which is not a member of the sequence. This last condition is necessary 
because Eqn (5-1) is not defined if H = t\. Then if the sequence of difference 
quotients corresponding to Eqn (5-1) has a limit we propose to call the value 
of this limit the instantaneous speed u(t{) of P at time t\. 
Expressed in the symbolic form of Chapter 3 we may write 

[fin) -/(/i)l 



u{t\) = lim 



_ Ti — t\ 



(5-2) 



This definition is obviously consistent with the case of the uniform motion 
of P, for then every difference quotient involved in the determination of the 
limit (5-2) would give the same constant value u, say. We will call this value u 
the constant speed of P. 

As the function /(r) is continuous it is clearly desirable that we define 
not in terms of the discrete variable t< but in terms of a continuous variable t. 
Fortunately we can do this easily, for the conditions of the connecting 
Theorem 3-6 are satisfied and allow us to rewrite Eqn (5-2) thus: 

« t) =^\m=m\ (5 . 3) 



+t L r-t 



We have now dropped the suffix 1 since t\ was not specific and represented 
any value of the time t belonging to J. 

It should be appreciated that the limit u(t) in Eqn (5-3) is a number and 
not a ratio of quantities as were the members of the sequence used to define 
the limit. The instantaneous speed u(t) can be interpreted as the distance 
through which P would move in unit time if, during that time, it were to move 
at a constant speed equal to the value u(t). Because Eqn (5-3) is consistent 
with the notion of a constant speed, it is customary to omit the adjective 
'instantaneous' and to speak only of the speed of P. 

The limit involved in Eqn (5-3) is of the indeterminate type and it will be 
our object to devise techniques for evaluating such limits for a wide class of 
functions /(?). In trivial cases these may be determined by simple algebraic 
considerations as this example shows. 

Example 5-1 Suppose that the distance of a point P from a fixed origin at 
time t is determined by the equation /0) = let 3 , where A: is a constant with 
dimensions (Length)(Time)" 3 . Find the functional form of the speed u(t) 
at time t, and determine its value when t = 4. 

Solution We are here required to evaluate the limit 
~k(T 3 - t 3 )' 



u(t) = lim 

T— « L 



T — t 



180 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



which is the form assumed by Eqn (5-3) when/(?) = kt 3 . 

Using the identity t 3 - t 3 = (t - t)(r 2 + rt + f 2 ) we may write 



u(t) = lim 



k(r- ?)( t 2 + t? + /2) - 
(t - 



= limA:(T 2 + rt + t 2 ) 

= 3/fcr 2 . 

Thus the functional form of the speed is u(t) = 3kt 2 , so that at / = 4 the 
speed has the value w(4) = 48/c. 

It is often helpful to check the form of a result by means of dimensional 
analysis. This is achieved by representing the fundamental quantities of mass, 
length, and time occurring in expressions and equations by the symbols M, 
L, and T, and ignoring any purely numerical multipliers that may be involved. 
The equations then become identities between expressions of the form U>M r T s , 
where p, r, and s are real numbers. Quantities other than length, mass, and 
time are represented as suitable combinations of these fundamental quantities. 
Thus speed and acceleration would be written LT' 1 and LT~ 2 , respectively, 
with no account being taken of their magnitudes. We illustrate this approach 
with Example 5-1. By supposition k has dimensions LT~ 3 , so that from the 
form of the solution we see that u(t) must have the dimensions kT 2 = (LT- 3 )T 2 
= LT' 1 , which are the dimensions of speed, as required. 



A 


Distance 








/w 








_ *r 
















,y 




M 









1* 












,'l 








T-» 


IBliiltilili 
Illlllilllll 







/ 




r Time t T 



Fig. 51 Speed interpreted as a derivative. 



There is a valuable graphical interpretation of the limit (5-3) shown in 
Fig. 5T which is the graph of a function f(t) together with the chord PQ, 



SEC 5-1 THE DERIVATIVE / 181 

where P is the point (?,/(/)) and Q the point (t,/(t)). 

The difference quotient within the brackets of Eqn (5-3) before the limit 
is taken is the tangent of the angle QPR. In the limit asr-*-(, so the point Q 
approaches the point P and the chord PQ approaches the tangent PS to the 
curve y —fit) at P. The value u(t) arrived at by considering the limit of the 
difference quotient (5-3) is thus the tangent of the angle SPR and so is equal 
to the gradient or slope of the curve y = fit) at P. The number uQi) evaluated 
at any specific time t = t\ is the derivative of /(f) with respect to t at t = t\. 
The limit u(t) as a function of t is simply called the derivative of fit) with 
respect to / and the operation of computing the derivative of a function is 
called differentiation. A function that possesses a derivative at each point of an 
interval is said to be differ entiable in that interval. Hence in Example 5-1, the 
derivative of kt 3 with respect to t at t = 4 is 48A:, whereas the derivative of 
kfi with respect to t is the function 3k t 2 . The function kt 3 is obviously 
differentiable in any finite interval. 

This heuristic approach has served to introduce the limiting arguments 
underlying the concept of a derivative, and we must now carefully reformulate 
these arguments and express them in general terms. We shall use the following 
key definitions. 

definition 5T A function /(x) of the real variable x will be said to be 
differentiable at xo if, and only if, 

f(x)-f(x ) 



lim 



x — Xq 



exists and is independent of the side from which x approaches xo. More 
generally, fix) will be said to be differentiable in an interval J if it is differen- 
tiable at each point of*/". At any points of -J for which the limit is not defined 
the function /(x) will be said to be non-differ entiable. 

definition 5-2 If f(x) is a differentiable function of the real variable 
x at Xo, then the value of the expression 

x->xq X *— Xq 

df 

will be denoted by/'(xo) or -^ 



, and we shall say that it is the derivative 



x = x 



of fix) at x = xo. If further we define y by the equation y = /(x), then we 

dy 
can also write the derivative of fix) at xo in the form — 

°- x x=xa 
These definitions merely express in a more sophisticated way, what is 
usually put as follows. 

Let y =f(x). Then if dy is the increment in y occasioned by an increment 



182 / DIFFERENTIATION OF FUNCTIONS CH 5 

dx in x, we have y + dy =f(x + dx) and hence 

dy = f(x + dx)-f(x) 
dx dx 

Thus at x = xo, 

dy _ f(x + dx) —f(x ) 
dx dx 

and so 

dy 
dx 



,. f(x + dx) -/(xo) 
= lim 



x = x 



fa->o dx 



To obtain the formulation of Definition 5-2 above, first write h in place 
of dx to obtain 



dy 
dx 



= lim 



/(XQ + h) -/(xo) 



= x 



h-*o h 



and then write x in place of xo + h, so that h = x — xo. 

What does the requirement, that lim{[/(x) — /(xo)]/(x — xo)} should 

X-*Xo 

exist, actually mean? It is this. There is a number /'(xo) such that the left- 
and right-hand limits of the function <p(x) = [f(x) — /(xo)]/(x — xo) as x 
approaches xo exist and are both equal to/'(xo). The function q>(x) itself is 
defined near but not at x = xo but has the property that lim <p(x) = /'(xo). 

x-*x 

We shall use this idea together with Theorem 3-4 when we discuss the general 
properties of derivatives of combinations of functions. 

If in Definition 5-2 we write xo + h in place of x, and replace xo by x in 
the subsequent result, we may formulate this definition. 

definition 5-3 If j' =/(x) is a differentiable function of the real variable 
x at all points of an interval J , then the derivative of/(x) in J is the function 
denoted either by f'(x) or dyjdx and defined by 

W . f* - lim ** + *>-**>. 

dx ;,_o " 

The operation of computing the derivative of a function is differentiation. 

Let. us now apply exactly the same arguments to Fig. 5-2 as were used in 
connection with the speed at a point of the particle trajectory in Fig. 5T. 
This time the graph represents any function y = f(x) satisfying the conditions 
of Definition 5-3. Then if P is any point in the interval within which /(x) is 
differentiable, and Q is an adjacent point, the chord PQ is, in some sense, an 
approximation to the tangent line to the curve PR at P. The limiting position 



SEC 5-1 



THE DERIVATIVE / 183 



Ax+h) 




Fig. 5-2 Derivative interpreted as a gradient. 



of the chord PQ will lie along the tangent line to the curve at P and in terms 
of angles we have lim 6 = a. However, 



f(x + h) -f(x) 



= tan i 



so that 



f( x + h)- f(x) 

hm : — = hm tan 6 



a->o h 

whence, finally, 

fix) = tan a, 

or, equivalently, 

ay 

■f- = tan a. 

ax 



h-*a 



(5-4) 



(5-5) 



This result shows that we may interpret the derivative of a differentiable 
function at a point as the gradient of the tangent line drawn to the curve at 
that point. It is implicit in the definition that the tangent line so defined should 
be independent of whether Q approaches P from the left or right. 

The geometrical interpretation of a derivative allows us to see quite 
clearly that in addition to the function needing to be continuous in the 
neighbourhood of a point at which it is required to be differentiable, it also 



184 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




Fig. 5-3 Non-differentiable function at x = xi and x 



needs a special kind ot smoothness. Specifically, the left- and right-hand 
tangents to the curve at the point in question must be one and the same. 
Indeed, we could re-phrase our definition of differentiability in terms of the 
equality of the left- and right-hand tangents at a point on the curve, just as we 
did when dealing with continuity. 

Consider the function f = f(x) shown in Fig. 5-3 and defined on the 
interval [.\o, V3], but only continuous in the semi-open intervals [xo, X2) 
and (.Y2, .y 3 ] . 

Then, despite the fact that the function f(x) is continuous in [*o, xz) 
and („Y2, .Y3], it is only possible to assert that tangent lines in the sense implied 
by Definition 5-3 can be constructed for points in the open intervals (x , xi), 
(xi, .Y2), and (.y 2 , .Y3). No tangent line can be constructed at X2 because of the 
discontinuity; two tangent lines h and h can be constructed at point P 
according as A and B approach P from the left and the right; whilst only 
right- and left- hand tangents h and U can be constructed at the end points 
.Yo and .Y3 because the function /(.y) is not defined outside [xo, X3]. 

We shall now show how Definition 5-3 may be used to determine the 
derivative of a function and also to prove its non-differentiability at a certain 
point. Our example is a continuous function whose behaviour is clear at all 
points other than the origin, at which the existence, or otherwise, of a tangent 
line to the curve cannot be deduced by inspection of its graph. 

Example 5-2 Prove that the function / defined by f(x) = x sin (1/.y) for 
.v # and/(0) = is continuous in (— 00, 00) and sketch its graph. Find its 
derivative by use of Definition 5-3 and show that it is not differentiable at 
the origin. 



SEC 5-1 



THE DERIVATIVE / 185 




Fig. 5-4 The function y = x sin (1/x). 

Solution Only the behaviour of/in the vicinity of the origin is in doubt here. 
When x ^ o we may write /(x) = [sin (l/x)]/(l/x) showing that for large x, 
fix) behaves like lim (sin h)jh = 1. Conversely, as the origin is approached, 

so x -> and because sin (l/x) is bounded by ± 1 it follows that lim/O) = 0. 

The limit of the function/O) at the origin is thus equal to the functional value 
itself and so/0) is continuous at the origin. It is clearly continuous elsewhere 
since it is the product of two continuous functions. Hence it is everywhere 
continuous and Fig. 5-4 shows its graph, which is symmetric about the j-axis 
because /O) is an even function. 

We shall approach the differentiability question in two stages: first for 
x ^ 0, and then for x = 0. Assuming x # and making a direct application 
of Definition 5-3 we obtain 



fix) = lim 



O -+- h) sin ( I — 

\x + hj 



xsm 



which we re-express as 

{x + h)sm\ l -l\ +-Y 1 



f'{x) = lim 



A-.0 



— x sin - 
x 



Now for h close to zero we may use the binomial theorem together with our 
'little oh' notation of Section 3-4, to write [1 + (A/x)]-i = 1 - (hjx) + o(h) 
as h ->- 0, and hence 



186 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



/'(*) = Um 



(x + h) sin 



;H + *>)- 



a: sin ■ 



Next we write the argument of the sine function as 

[(l/x) - (hlx*) + [o(h)]/x] 
and use the trigonometric expansion for the difference of two angles to obtain 



lim 



(x + h) 



" . 1 (h o(h)\ 1 . / h o{h)\ 
sin - cos — — cos - sin ( — 

x \x* x J x xx 2 - x J . 



— x sin - 

X 



h-*o L h 

Consider the behaviour of the terms comprising this quotient. If the first 
and last terms are taken together then in the limit as h -> they reduce to 
the single term sin (l/x). The remaining term in the centre is 



— (x + h) cos - 
x 



sin 



U 2 x ) 



and since x ^ is fixed, it follows from limit (3-9) that this reduces to 

1 1 

cos - 

X X 

as h ->■ 0. 

Combining these two results we find that the derivative /'(x) is 

f'(x) = sin cos - for x # 0. 



Thus we have used Definition 5-3 to compute the derivative, and as this is 
defined for all x ^ it follows that y = x sin (l/x) is differentiable for all 
such x. 

Finally we must examine the behaviour of the derivative at the origin 
using Definition 5-3. Setting x = 0we obtain 

h sin (I jh) -0 



/'(0) = lim 



h 



= lim sin (1/fi). 

As sin {\jh) oscillates boundedly with ever increasing frequency when 
h — >■ 0, it follows that/'(0) is not defined. This establishes the non-differenti- 
ability of/(x) at the origin as was required. 



SEC 5-1 THE DERIVATIVE / 187 

We close this section by deducing the derivatives of some important 
elementary functions, and stating them as theorems. 

theorem 5-1 The derivative of a constant function is zero. 

Proof Let k be any constant and consider the function/(x) where f(x) = k 
for all x. Then 

f(x + h)-f(x) k-k nc „ 

JK ' — J — = = for all x. 

h h 

Hence 

lim** + *>-**> esQ for all,. 

theorem 5-2 If n is any positive integer, then the real function y = x n 
is differentiable everywhere and has the derivative dj/dx = nx n_1 . If m is 
any negative integer, then the function y = x m is differentiable everywhere 
except at the origin and has the derivative dy/dx = mx m ~ x . 

Proof We must first consider the limit of the difference quotient 
[(x + h) n — x n ]jh. By the binomial theorem we have 

(x + h) n - x n 



x n + nxn-lfr + ^ 1 x n-2h2 + . . ■ + (") x n - r fl r +• • • + h n - X n 

2! \rl 

= _ 

n(n — 1) „, In\ , , , , 

= nx™- 1 + - 5 — — - x n ~ 2 h +■■■ + [ I xn-rhr- 1 + ■ ■ • + A"" 1 . 

Now lim h = so lim h r = for 1 < r < n — 1 and so 

A->-0 h-*0 







lim ( I a n ~ r h r - 1 = 0. 



Consequently, 

,. (x + h) n - x n 

hm ^ = nx"- 1 . 

This is defined for all finite x including x = and so proves the first part of 
the theorem. Next let m = —n. Then 

(x + h) m — x m _ (x + h)~ n — x~ n _ (x n - (x + h) n 

h ~ h 

Now from our result above 



_ (x n - (x + h) n \ 1 

_ \ h~ / x n (x + h) n ' 



188 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



,. x«-(x + h)* 

lim i- = -nx n - x 

h-+o n 

whilst 

lim (x + h) = x and so lim (x + h) n = x n . 

h-+0 k^O 

If x ^ 0, 



lim — = = 

a-o x n (x + h) n lim x n . lim (x + h) n x 2n 

h->0 h-+0 



Thus 



,. (x + h) m - x m ,1 

lim : = — nx"" 1 . — - = — nx""" 1 = mx m ~ x . 



a— o h 

Hence we have proved that 



y-2n 



dy 
dx 



x=z 



ax 



= «xo w_1 for all xo 



if n is a positive integer, and for all non-zero xo if n is a negative integer. 
Later we shall prove this result for all real n. Henceforth we shall use the 
result freely, irrespective of the value of n. 

theorem 5-3 The functions sin ax and cos ax of the real variable x, 
where a is any real number, are differentiable everywhere and 



— (sin ax) = a cos ax 
dx 



dx 



(cos ax) = — a sin ax. 



Proof These results follow by applying Definition 5-3 and then using limits 
(3-9) and (3-10). Thus we have 

d . sin a(x + h) — sin ax 

— (sin ax) = lim 

dx ^0 h 



= lim 



"sin ax cos xh + cos ax sin ah — sin ax" 



= sin ax 



/cos ah — 1\ /sin cah\ 

lim + cos ax hm I — - — I 

a-o \ It 1 h^o\ h / 



= + a cos ax. 

As this function is defined for all finite x, the first part of the required result 
has been established. The remainder of the proof follows exactly similar lines, 
and so will be omitted. 



SEC 5-2 RULES OF DIFFERENTIATION / 189 

Example 5-3 Find the derivatives of the following functions stating any 
point at which they are not differentiable. 

/ n r/ n (3 for — oo < x < 1 

( a )/W = ( 2 forl<x<oo. 

(b) f( x ) = x 5 for all x. 

\x~ z for x ^= 



(c)/W = ( lforx = 



(d) f(x) = sin Ax. 

(e) f(x) = cos Ix. 

Solution (a) By virtue of Theorem 5-1, the function f(x) has a zero deriva- 
tive for all x except at the point x = 1 where it is not defined. 

(b) From Theorem 5-2 we have dy/dx = 5x 4 for all x. 

(c) From Theorem 5-2 we have dy/dx = — 3x~ 4 for x^O, and the 
derivative is not defined at x = 0. 

(d) and (e) From Theorem 5-3 we have 

— (sin Ax) = 4 cos Ax — (cos Ix) = —7 sin Ix for all x. 
dx dx 

By now it is obvious that Definition 5-3 is a working definition that can 
be used. However, some better method than its direct application is obviously 
needed to compute derivatives of complicated functions. This requirement will 
be systematically pursued in the next section. 

5-2 Rules of differentiation 

The complicated functions that occur in mathematical and physical studies 
are invariably the result of forming sums, products, and quotients of simple 
algebraic and trigonometric functions. This suggests that our next task should 
comprise a general study of the operation of differentiation when applied to 
sums, products, and quotients of arbitrary differentiable functions. We will 
present our results in the form of basic theorems which must become 
thoroughly familiar to the reader. 

theorem 5-4 (differentiation of a sum) If f{x) and g(x) are real valued 
functions of x, differentiable at xo, and ki and &2 are constants, then the 
linear combination k\f{x) + k2g(x) is also differentiable at xq. Furthermore, 



±( kl f( X ) + k2g(x)) 



= kif'(xo) + k 2 g'(x ). 

Proof Here we must apply Definition 5-3 to the linear combination 
kif(x) + k%g{x). We obtain 



i- (klf{x) + k 2 g(x)) 

dx 



190 / DIFFERENTIATION OF FUNCTIONS CH 5 

= ljm hfjxo + h) + k 2 g(x + h)- [kifjxo) + k 2 g(x )] 
i,^o h 

= kl lim /fa + ^)-/fa) + kt , im g(*o + *)-gfro) 

A-0 h ,,_ h 

= kif'(xo) + k 2 g'(xo). 

Iff and g are both differentiable in some common interval J, then the 
above argument when applied to each point of J yields the result 

1 [*i/(*) + £ 2 g(x)] = kifix) + k2g'(x), 

where x is any point of J . The constants £i and k 2 are often absorbed into 
the functions /and g, when the result could be expressed 'the derivative of a 
sum of functions is equal to the sum of their derivatives'. The task of showing 
that this result is true for a linear combination of an arbitrary number of 
differentiable functions is left to the reader as an exercise involving proof by 
induction. 

Example 5-4 Let us use Theorem 5-4 to compute the derivative of 
f(x) = sin 2 x. 

Solution As it stands we cannot differentiate/^). However by a well known 
trigonometric identity we may transform f(x) to the form 

f(x) = i(l - cos 2x), 

when Theorem 5-4 becomes applicable. Then, using our earlier results 
concerning the differentiation of a constant and of cos ax we find that 

d d 

— (sin 2 x) = — {J(l - cos 2x)} 

ax dx 

d /1S d ,, 
= -7- (\) ~ t- (i cos 2x) 
ax ax 

d 

= — I — (cos 2x) 

ax 

= — \ . (—2) sin 2x 

= 2 sin x cos x. 

theorem 5-5 (differentiation of a product) If/(x) andg(x) are differenti- 
able real valued functions at xo, then so also is the product function/(x)g(x). 
Furthermore, 



£(/W*M> 



= f'(xo)g(xo) + f(x )g'(xo). 

x-*xa 



SEC 5-2 RULES OF DIFFERENTIATION / 191 

Proof Again we consider a difference quotient but this time, for economy 
of expression, use the form of limit given in Definition 5-2. We have the 
identity 

/(*W-/W*o) s //(*)-/fa)\ ( + / g W-g(-Vo)\ 

X — Xo \ X — Xo I \ X — xo } 

Now we wish to show that lim/(.v) =f(xo). This would be true if fix) were 

.r--.ro 

continuous but we only know that it is differentiable and as yet do not 
know that this implies continuity. We shall prove that it does. As/(x) is 
differentiable at x — xo we must have 

f(x) -/pro) 

=/ (*o) + o(h) asx-> xo, 

x — Xo 

where h = x — xo- Hence 

fix) —fixo) = (x — xo)[f'ixo) + oQi)] as x ->- x . 

This implies that if x is taken sufficiently close to ,y then the difference 
fix) —fixo) can be made arbitrarily small. This is just our definition of 
continuity and so we have proved that differentiability of/(x) at xo implies 
its continuity at that point. Thus we are permitted to write 

lim/(x) =/Oo) 

x^xo 

and, similarly, 

\im gix) = gix ). 

Z-KTO 



Now 

l ifx)-fixo) \ ( gjx) - gjx ) \ 

so, finally, taking the limit of (I) asi-> xo, we obtain the result 
= f'(.xo)g(x ) +fix )g'ix ). 



±(f(x)g(*)) 



Again, if / and g are both differentiable in some common interval J 
then, as before, we obtain the more general result 

£ ifix)gix)) =f'ix)gix) +fix)g'ix) for xe/. 

As an incidental detail of this proof we have shown that differentiability 
at a point implies continuity. This result is worth stating formally. 



192 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



theorem 5-6 If a real valued function /(x) is differentiate at the point 
.Yo, then it is also continuous there. The converse result is not true. 

Proof It only remains to prove that the converse result is not true: namely, 
that continuity does not imply differentiability. This has already been seen 
in connection with Fig. 5-3, but let us give a specific example. Our final 
assertion in Theorem 5-6 will be valid even if we can produce only one 
example of a function that is continuous at a point but is not differentiable 
there. Such an example used to prove the falsity of an assertion is a counter- 
example, and in this case we choose the function /(x) = |x|. This is known 
to be continuous at x = 0, but the derivative as defined in Definition 5-3 
is not denned at the origin so the function is not differentiable at that point. 

Example 5-5 Differentiate the function f(x) = sin 2 x and compute/' (577). 

Solution We express the function as a product and use Theorem 5-5. 

d d 

— (sin 2 x) = — (sin x . sin x) 

ax ax 



— (sin x) 
ax 



sin x + sinx 



dx 



(sin x) 



= 2 sin x 



-(sinx) 



= 2 sin x cos x. 

As would be expected, this verifies the result of Example 5-4. Finally, 
using this expression we compute 



dx 



(sin 2 x) 



X= Jtt 



. . 77 77 

= 2 sin- cos- = 1. 
4 4 



Our next theorem is important and concerns the rule for differentiating a 
composite function or, more simply, the rule for the differentiation of a 
function of a function. 

theorem 5-7 (differentiation of composite functions) If g(x) is a real 
valued differentiable function at x = xo and/(w) is a real valued differentiable 
function at u = g(xo), then/[g(x)] is differentiable at x = xo. Furthermore, 



d " {/[*(*)]} 



= flg(xo)].g'(xo)- 



Proof We have the obvious result 

f[g(x)]-f[g(xo)] f[g(x)] -f[g(xo)] g(x) - g(xo) 



X — xo 



g(x) - g( x o) 



X — Xo 



SEC 5-2 



RULES OF DIFFERENTIATION / 193 



(A) 



Since g(x) is differentiable at x it is continuous there, and so g(x) -> g(xo) 
as x -* x . So, writing g(x) = w, g(x Q ) = a we have 

/[*(*)] -/fcfro)] _ /(»)-/(«) g(x)-g(x ) 

X — Xo K — (2 X — Xo 

Now for ease of argument we shall assume the behaviour of g(x) to be 
strictly monotonic in some neighbourhood of jc , so that g(x) = g(x ) only 
when x = x . In these circumstances the difference quotients on the right-hand 
side of (A) are well defined as x ->■ x so that we may take limits and obtain 



dx 



{/[*(*)]} 



= lim 

X = XQ %~>xo 

= lim 



f[g(*)} -/fc(*o)] 

X — Xo 



~ m -m 

. u — a 



. lim 

X-*Xo 



'g(x) ~ g(*o) " 

X — Xo 



= /'(«)• g'(*o) 

= /fe(*o)].£'(*o). 



(B) 



It is not difficult to show that the theorem is still true when g(x) is not 
monotonic in some neighbourhood of x and an infinite sequence of points 
{x t } exist with limit point x at all of which g(x t ) = g(x ). 

All that is necessary here is to observe that if x ->- xo through the suc- 
cessive values xt of this sequence, then g(x t ) — g(x ) = and so 



g(Xi) - g{x ) 



= for every i. 



Xi — Xo 

Hence, by Theorem 3-6, it follows that 



Tx { ^ 



0. 



However, by the same argument, 
flgixt)] -flgjxo)] 

Xi — Xo 

showing that 



for every /, 



rx (A S (m 



= o, 



and so result (B) is also valid in this case. 

If (B) is true at each point of some interval J, then we have the general 
result 



^{f[g(m=fig(x)].g\x). 



194 / DIFFERENTIATION OF FUNCTIONS CH 5 



When the substitution u = g(x) is made, this result can be written: 
d df du 

In this form the theorem is known as the chain rule for differentiation, 
and it is this result that is most often found in textbooks. By repeated applica- 
tion, the chain rule readily extends to enable the differentiation of more 
complicated composite functions such as the triple composite function 
f{g[h(x)]}, always provided the functions/, g, and h have suitable differenti- 
ability properties. In this case, setting v = h(x) and u = g(v) result (5-6) 
takes the form 

d r ,. ,, df du dv 

t- / («) =j--t-t- (5-7) 

dx du dv dx 

Further extensions of the same kind are obviously possible and are 
left to the reader. 



Example 5-6 Differentiate the following functions and find the values of 
their derivatives at x = 1 : 

(a) sin(x2 + 3); 

(b) (jcs + x + l)i/3; 

(c) sin V(l + x 2 ). 

Solution (a) Set u = x 2 + 3 so that 

d d . 

— [sin (x 2 + 3)1 = — (sin u). 
dx dx 

From the chain rule : 

d d du 

— [sin (x 2 + 3)] = — (sin u) . — • 
dx du dx 

Now (d/dw)(sin u) = cos u, du/dx = 2x so that 

d 

— [sin (x 2 + 3)1 = (cos u) . 2x 
dx 





= 2x cos (x 2 + 3). 


nee at x = 1, 


— [sin (x 2 + 3)] 
dx 


= 2 cos 4. 

x = l 



(b) This time set u = x 3 + x + 1, 



SEC 5-2 



RULES OF DIFFERENTIATION / 195 



dx dx 

From the chain rule: 

-£- [(*» + x + l)i/3] = d (M i/ 3) . p. 
dx du dx 

Hence as (d/dt/)(« 1/3 ) = §h~ 2/3 , dw/dx = 3x 2 + 1 we obtain 



— [(X 3 + X + 1)1/3] = (l M -2/3) . ( 3x 2 + !) 

dx 



Thus when x = 1 , 
d 



dx 



[(x3 + X + 1)1/3] 



- J_ 

~ W 3 ' 



(c) We must use the extension of the chain rule given in Eqn (5-7). Set 
v = 1 + x 2 when sin -\/(l + x 2 ) = sin yjv, and u = \/v when sin y/(l + x 2 ) 
= sin u. 



Then 

_d 
dx 



[sin VO + x 2 )] = — (sin u) 
ax 



However, 
dv 



dx 



= 2x and 





r d , ■ j 


du dv 


"" "" 


— (sin m) 
_d« 


dv dx 


du dv 


= cos u — • 

dv dx 


du 
dv 


1 


2^(1 + x 2 ) 



so that, combining all the results, 

d r • //i , ?Yi x cos V(l + * 2 ) 

— [sin V(l + * 2 ) = tt;— — « 

dx v(l + x ) 



Whence at x = 1, 



- [sin V(l + x 2 )] 



cos \/2 

V2 



theorem 5-8 (differentiation of a quotient) If /(x) and g(x) are real 



196 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



valued differentiable functions at x and g(x ) ^ 0, then the quotient 
f(x)/g(x) is differentiable at x . Furthermore 



dx \_g(x)_ 



X = XQ 



g(xo)f'(x ) - g'(xo)f(xo) 
lg(xo)] 2 



Proof If we consider the quotient f(x)/g(x) to be the product of the two 
functions /(x) and \/g(x), we have by Theorem 5-5 



d* lg(x)_ 



x = x 



Wy f0c)+Ax) t 



lg(x\ 



Now we must compute (d/dx)(l/g). We set g(x) = u when, from the chain 
rule, 



d 

dx 



1 



Six). 



x=xo 



d_ 
dx 



T 

u. 

1 du 
w 2 dx 

-g'(x ) 



x = x 



x^xo 



lg(xo)] 2 
Hence, combining our results, we obtain the desired result 



dx \_g(x\ 



g(xo)f'(xo) - g'(x )f(x ) 
lg(xo)] 2 



As in the other cases the general result follows when the conditions of 
the theorem are satisfied throughout 'some interval J '. It has the obvious 
form 



_d 
dx 



fix) 
lg(x). 



g(x)f'(x) - g'(x)f(x) 
[g(xW 



Example 5-7 Differentiate (3x + l)/(x 2 — 2) and determine the values of 
x for which the derivative is not defined. 



Solution Set f(x) = 3x + 1 and g(x) = x 2 - 2. Then f'{x) = 3 and 
g'(x) = 2x for all x, whilst g(x) = for x = ±\/2. Hence applying 
Theorem 5-8 we have 



d 


'3x + r 

x 2 - 2. 


(x 2 


- 2) . 3 - (2x)(3x + 1) 


dx 


(x 2 - 2)2 






"3x 2 + 2x + 6' 










(x 2 - 2)2 





SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 197 



provided x ^ ± \/2. 

To complete this section, Table 5- 1 summarizes the results of differentiating 
the trigonometric functions. Unfamiliar results may be deduced by directly 
applying Theorem 5-8 to the definitions of the functions concerned. 

Table 51 Derivatives of trigonometric functions 

— (sin x) = cos x — (cos x) = — sin x — (tan x) = sec 2 x 
d* Ax ax 

— (cosec x) = — cosec x cot x — (sec x) = sec x tan x — (cot x) = —cosec 2 x 
ax ax ax 



5-3 Some important consequences of differentiability 

We preface this section by proving a result that belongs more properly to 
Chapter 3 since it depends for its validity only on the property of continuity. 
Our sole reason for discussing it here is to present it in the context in which it 
will first be used. It is usually known by the name of the intermediate value 
theorem and we shall now show that the idea underlying it is extremely 
simple. 

Consider the situation in which a recording thermometer attached to 
some piece of equipment records its temperature at pre-assigned times. 
Suppose, for instance, that at times ri and H the temperatures recorded were 
7i and T%, respectively. Then although there is no record of the variation of 
the temperature T(t) at times t between ti and ?2, it may be safely inferred 
that the temperature will pass at least once through each intermediate value 
between 7\ and Ti. It is quite possible for the temperature to assume values 
that do not lie between T\ and Ti, but no assertion can be made about such 
an event. The situation is illustrated in Fig. 5-5 where T* is a typical tempera- 
ture intermediate between T\ and T%, and the dotted and solid lines 
represent two possible temperature variations with time. 

This physical situation is an example of the operation of the intermediate 
value theorem in everyday life, and we are able to make our assertion because 
we know from experience that however rapidly a temperature may change, 
it can never undergo an abrupt jump. In mathematical terms we are saying 
that temperature change must be a continuous process. Expressed like this 
the result seems obvious, but how may we prove it ? Our simple proof relies 
on the postulate of Section 3-2, which asserts that every bounded monotonic 
sequence tends to a limit, but first we state the formal result. 

theorem 5-9 (intermediate value theorem) Let the real valued function 
f(x) be continuous on the closed interval [a, b] and such that /(a) =£f(b). 
Then if y* is any number intermediate between f(a) and f(b), there exists a 
number x* between a and b such that y* = /(**). 



198 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



Proof Although a diagram is not essential for this proof, the representative 
situation shown in Fig. 5-6 will be of help. 

First set x 1 = \{a + b), then if/CxJ = y* the result is proved. If not 
consider the intervals (a, xi), (x h b). Then in one of these two intervals, y* 
will lie between the functional values occurring at either end of the interval. 
Call this interval h and let it be represented by the open interval (ai, bi). 
Thus in Fig. 5-6, h is the right-hand interval and so in that case a\ = \{a + b), 
bi = b. 

Next set x 2 = h(a^ + b x ). If f(x 2 ) = y* the result is proved. If not con- 
sider the intervals (a\, X2), (x2, bi). Then in one of these two intervals, y* 
will lie between the functional values occurring at either end of the interval. 
Call this interval h and let it be represented by the open interval (a^, b%). 
in Fig. 5-6 the interval h is the left-hand sub-interval of h, so that a% = a\, 
bz = i(ai + bi). 

We either prove the result directly for some x n or we define an infinite 
sequence of open intervals h => h => h => . ■ ■■ Because each interval is 
contained by all its predecessors it then follows that the sequence of numbers 
fli, a%, fl3, • . . is monotonic increasing and bounded above whilst the 
sequence of numbers b\, b%, b%, . . . is monotonic decreasing and bounded 
below. Hence by the postulate of Section 3-2, the sequences {at} and {bi} both 
tend to a limit. That they both tend to the same limit follows from the fact 
that the length of the nth interval /„ is (b — d)\2 n , which tends to zero as 
n ->oo. Letting the common value of these two limits be denoted by x* 

















Temp. 








/ 


/ 


T 


/ / 




Bi 


1/ 


f 


1 2 


/ / 
i / 




T* 


j f 


vH 


H* 








f 1 


•JBB 






t 1 












t 1 
i 1 












t / 










r, 


\ 


/ / 










i 
/ 




k 







. 






' 2 


Time t r 



Fig. 5-5 Physical illustration of intermediate value theorem. 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 199 



we have lim \f(a n ) — f(x*)\ =0, thereby showing the existence of the 

n— »-oo 

required number x*. 

The following is an obvious consequence of the intermediate value 
theorem : 



Corollary 5-9 Every function that is continuous in a closed interval attains 
both its greatest and least values at points of that interval. These values may 
occur at the end points of the interval. 




Fig. 5-6 Intermediate value theorem. 

5-3 (a) Maxima and minima 

One of the most familiar and useful applications of differentiation is to the 
problem of determining those points in some interval [a, b] at which a 
function /(x) assumes its maximum and minimum values. Collectively these 
values are known as the extrema of the function/(x) on the interval [a, b] and 
they are of various types as this definition indicates. 

definition 5-4 (extrema) Let/(x) be a continuous function defined on 
the interval [a, b] so that it attains its greatest and least values at points of 
that interval. Then we say that the point x belonging to [a, b] is: 

(a) an absolute maximum if/(x ) >f(x) for all points x in [a, b] ; 

(b) an absolute minimum if/(x ) </(x) for all points x in [a, b] ; 

(c) a relative maximum if/Oo + h) — /(*o) < for \h\ sufficiently small; 



200 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



(d) a relative minimum if/(xo + h) —f(xo) > for \h\ sufficiently small. 

No assumption of differentiability has been made when formulating this 
definition so that in Fig. 5-7, point P is an absolute maximum and both 
points R and T are relative maxima. Point Q is an absolute minimum and 
point S a relative minimum. Although the functional value at U lies inter- 
mediate between those at Q and S, it is not a relative minimum in the sense 
of the definition, because it lies at the end of the domain of definition [a, b] 
so that only the one-sided behaviour of the function is known there with 
respect to h. 




Fig. 5-7 Extrema of a function on [a, b]. 



If now, in addition to continuity, we also require of/(x) that it be differen- 
tiable at the point xo occurring in Definition 5-4, we can easily devise a simple 
test to identify' the points where extrema must occur. Consider point P in 
Fig. 5-7 as representative of a maximum at which the function is differentiable. 
The fact that P happens to be an absolute maximum is immaterial for the 
subsequent argument. 

By supposition, if/ is differentiable at P, the expression 



f'(xo) = lim 



'fix) 



X — Xo 



Axon 



must be independent of the manner of approach of x to xo. Now for maxima 
of types (a) and (c) we have/(x) — f(xo) < 0, and hence it follows that when 
x < xo, f'(xo) is the limit of an essentially positive function ; whereas when 
x > xo, f'( x o) is the limit of an essentially negative function. Clearly this is 
only possible if f'( x o) — 0. We have thus proved that if/ is differentiable at 
xo, then a necessary condition that/should have a maximum at xo is/'(xo) = 0. 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 201 

Similar reasoning establishes that the condition f'(x ) = is also a neces- 
sary condition for the differentiable function /to have a minimum at xo- To 
show that the vanishing of the derivative /' at a point is not a sufficient 
condition for that point to be an extremum, we appeal to a counter-example. 
The function /= x 3 has a continuous derivative/' = 3x 2 which vanishes at 
the origin. Nevertheless, / is negative for x < and /is positive for x > 0, 
thereby showing that despite the vanishing of the derivative, neither a 
maximum nor a minimum of the function can occur at the origin. Later we 
shall identify behaviour of this nature as typical of a point of inflection with a 
horizontal tangent. Generally speaking, a point of inflection is a demarcation 
point on the graph of a differentiable function separating a region of con- 
vexity from a region of concavity. Collectively the points at which the deriva- 
tive vanishes, regardless of whether or not they are maxima, minima, or points 
of inflection are called critical points or stationary points of the function. 

Combining the previous results, and recalling that the condition that/be 
differentiable at xo precludes behaviour of the type encountered at point T 
in Fig. 5-7, we are able to formulate the following general result. 

theorem 5T0 Let/ be a real valued differentiable function on some 
interval [a, b]. Then the stationary points of/ are the numbers £ for which 

fW = o. 

Once the stationary points of a function have been determined it is 
necessary to examine the functional behaviour in the vicinity of each one in 
order to determine the nature of the point involved. An absolute maximum 
is identified from amongst the relative maxima by direct comparison of the 
functional values at the stationary points in question. A similar process 
identifies an absolute minimum. 

Example 5-8 Without appealing to graphical ideas, find the location and 
nature of the extrema of the following two functions and determine if they 
are differentiable at these points : 

(a) f(x) = 1*3 + 2;C 2 + 3jc + 1 ; 

(b) f(x) = (2x - 5)x 2/3 . 

Solution (a) The stationary points are determined by finding those values 
x = | for which the derivative/' vanishes. 

Now/' = x 2 + 4x + 3 and so the desired stationary points are given by 
the roots of the equation 

f 2 + 4£ + 3 = 0. 

These roots are f = — 1 and f = — 3, and the functional values at the 
respective points are/(— 1) = —J and/(— 3) = 1. As the derivative/' is the 
sum of continuous functions it is everywhere continuous, so that no cusp-like 
behaviour with associated extrema as typified by point T in Fig. 5-7 can arise. 



202 / DIFFERENTIATION OF FUNCTIONS CH 5 

So the two points £ = — 1 and f = —3 are the only ones at which stationary 
values can occur. An examination of the behaviour of the function near these 
points will determine if these stationary values correspond to maxima, 
minima, or points of inflection. 

A sketch graph would quickly show that in fact f = — 3 corresponds to 
a local maximum and f = — I to a local minimum, but we are specifically 
required to establish these results by analytical means. How then can we do 
this? The solution lies in a direct application of Definition 5-4, and we 
illustrate the argument by considering the stationary point f = —1. To find 
the behaviour of f close to f = — 1 we shall set x = — 1 + h, where h is 
small, and substitute in/(.Y) to obtain 

/(_1 + /,) = i(_i + hf + 2(-l + A)2 + 3(-l + h) + 1, 
whence, 

/(-l+A)=-* + A 2 + y- 
Now/(— 1) = —J so that we may also write this result in the form 

f(-i+h)-f(-i)=h^i+^y 

Clearly for \h\ small, the right-hand side is essentially positive, and so we 
have succeeded in showing that close to f = — 1 , 

f(S + h)-f(i)>0, 

and so by Definition 5-4 (d) the stationary point f = — 1, at which /(f) 
= —J, is seen to be a local minimum. An exactly similar argument will 
establish that the stationary point f = —3, at which /(f) = 1, is a local 
maximum. These are only local extrema because it is possible to find values 
of x for which/> 1 and/< — tj. 

Solution (b) This case is more complicated. We have 

df 20-x - 5) 

d* 3x 1/3 

showing that the stationary points of/ are determined by the roots of the 
equation 

2(2f - 5) 



= 2f 2/3 + 



3| 1/3 



This has the single root | = 1 at which /(l) = —3, showing that the function 
has only one stationary point. To determine the nature of this point let us set 
x = I + h, where \h\ is small, and substitute into/(x) to find 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 203 

/(l + h) = (2A - 3)(1 + h)*\ 

Next we expand the factor (1 + h) 2 ' 3 by the binomial theorem as far as 
terms involving h 2 to obtain 

/(l +h) = (2A - 3)(1 + %h - #* + 003)) 
or, 

/(l + A) = -3 + W + 0(A 3 ). 
Using the fact that/0) = — 3 this becomes 

/(l + h) -/(l) = f/*2 + 0(A 3 ) 

showing that close to £ = 1, /(£ + A) —/(|) > 0. Hence by Definition 
5-4 (d), the stationary point f = 1 is seen to correspond to a local minimum. 
Again, it is only a local minimum because for large negative x we have 
/<-3. 

We now observe that/' is defined for all x other than for x = 0, at which 
point /(0) = 0. The behaviour of the function in the vicinity of the origin 
needs examination since, as it is not differentiable there, Theorem 5-10 can 
provide no information about that point. Set x = h, where h is small, and 
substitute in /to get 

f(h) = (2h - 5)h 2 ' 3 . 

Now/(0) = 0, so that we may rewrite this as 

f(h) -/(0) = (2h - 5)h**, 

thereby showing that as the right-hand side is essentially negative for suitably 
small h, close to f = we have/(£ + h) — /(£) < 0. From Definition 5-4 (c) 
we now see that the origin is a local maximum, despite the fact that /is not 
differentiable at that point. It is only a local maximum because for large 
positive x we have/>/(0). For reference purposes the function is shown in 
Fig. 5-8. 

The method of classification of stationary points that we have just illus- 
trated is always applicable, though it provides more information than is 
often required. This is so because not only does it discriminate between 
maxima and minima, but it also provides the approximate behaviour of the 
function close to the point in question. We shall return to this problem later 
to provide much simpler criteria by which the nature of stationary points 
may be identified. 

5-3 (b) Rolle's theorem 

One form of Rolle's theorem may be stated as follows. 

theorem 5-11 Let /be a real valued function that is continuous on the 
closed interval [a, b] and differentiable at all points of the open interval 



204 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




Fig. 5-8 y = (2x ■ 

(a, b). Then if f{a) =f(b) there is at least one point x = f interior to (a, b) 
at which /'(I) = 0. 

Proof We know from Corollary 5-9 that a continuous function/^) defined 
on the closed interval [a, b] must attain its maximum value M and its mini- 
mum value m at points of [a, b]. Then if m = M on [a, b], the function 
f(x) = constant, and since the derivative of a constant is zero, the point 
x = f at which /'(I) = may be taken anywhere within the interval. 

If f(x) is not a constant function then m ^ M, and as f(a) =f(b) it 
follows that at least one of the numbers m, M must differ from the value 
f(a). We shall suppose that M ^f(a). Then clearly the value M must be 
attained at some point .v = f interior to (a, b). As/is assumed to be differen- 
tiable in (a, b) it follows that Theorem 5-10 must be applicable showing that 
f'(i) = 0. A similar argument applies if m ^f(a). Geometrically this theorem 
simply asserts that the graph of any function satisfying the conditions of the 
theorem must have at least one point in the interval [a, b] at which the 
tangent to the curve is horizontal. 

If/ is not differentiable at even one interior point of (a, b) then Rolle's 
theorem cannot be applied. Our counter-example in this instance is the 
simple function f(x) — \x\ with — 1 < x< 1. This function is everywhere 
continuous, and is differentiable at all points other than at the origin, but 
there is certainly no point x — $ on [— 1, 1] at which/' = 0. The graph of 
this function is shown in Fig. 5-9, with one of a function g(x) not satisfying 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 205 



ills. 
i \ 

i 
t 

illitlll 
1 


A 


y 
■l 


■s y 


-1 











g(a)=g(b) 



(a) (b) 

Fig. 5-9 Counter examples for Rolle's theorem : (a) Rolle's theorem does not apply 
— no point f for which /"(f) = ; (b) g '(I) = 0, but Rolle's theorem does not apply. 

the conditions of the theorem but for which the result happens to be true. 

5-3 (c) Mean value theorems for derivatives 

Our most important application of Rolle's. theorem will be in the proof of 
the mean value theorem for derivatives. In a first account of the subject it is 
difficult to indicate just how valuable and powerful this deceptively simple 
theorem really is as an analytical tool. However something of its utility will, 
perhaps, be appreciated after studying the remainder of this chapter. First 
let us present an intuitive approach to the theorem. 

Consider Fig. 510 which represents a graph of a differentiable function 
f(x) on the open interval (a, b). Then as P and S are the points (a,f(a)) and 
(b,f(b)), the gradient m of the line PS is 

f(b)-f(a) 

m = — - 

b — a 

Now we may identify points Q and R, with respective jc-coordinates f and rj 
interior to (a, b), at which the tangent lines /i and h to the graph are parallel 
to PS, and so must also have the same gradient m. Then because of the 
geometrical interpretation of the derivative/' as the gradient of the tangent 
line, at either P or Q we may equate m and/'. If we confine attention to point 
Q we have 



f(b)-f(a) 
b — a 



=/m 



where a < £ < b. This is the form in which the mean value theorem for 
derivatives, also known as the law of the mean, is usually quoted. In geo- 
metrical terms the theorem asserts that there is always a point (£,/(£)) on 
the graph of the function, with a < £ < b, at which the tangent to the curve is 
parallel to the secant line PS. The fact that the precise value of f is not 
usually known is, generally speaking, unimportant in the application of this 



206 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




Fig. 5- 10 Illustration of the mean value theorem. 

theorem. This is because it is often used with some limiting argument in 
which b —>■ a, so that f -> a also. A formal statement of the theorem is as 
follows. 



theorem 5-12 (mean value theorem for derivatives) lff(x) is a real valued 
function that is continuous in [a, b] and differentiate in (a, b), then there 
exists a point f interior to (a, b) such that 



f(f>) -f(a) 
b — a 



=/m 



The existence of more than one point f in (a, b) at which this result is 
true is not precluded. This is so because it is only asserted that such a point 
exists, and not that there is necessarily only one such point. Such is the case, 
for example, in Fig. 5T0 since as was remarked, /'(f) =/'(*?) w ^h f ¥= *), 
though both points f and v\ are interior to (a, b). 

Many people would regard the argument above as proof enough of the 
mean value theorem, but for the more critical reader we now offer the 
promised proof based on Rolle's theorem. 

Proof As with the proofs of many mathematical theorems, our result is 
established more easily by a somewhat artificial approach than by a direct 
method. Here we shall utilize the intuitively obtained result above to suggest 
the form of a special function F(x) to which Rolle's theorem can be applied, 
thereby yielding the desired result. 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 207 

Specifically, since by implication the result depends on /(.v) and x, we 
shall try to find the simplest function F(x) that depends on f(x) and x, 
that is continuous in [a, b] and is differentiable in (a, b), and is such that 
F(a) = F(b). The value of F(a) may be assigned arbitrarily and F(x) will 
still satisfy Rolle's theorem, so to simplify slightly the working we shall 
assume that F(a) = F(b) = 0. 

We consider the obvious function 

F(x) = A + Bx+f(x) 

which clearly satisfies the continuity and differentiability conditions of 
Rolle's theorem. The constants A and B must be chosen in order that 
F(a) = F(b) = 0. 

Thus 

= A + Ba +f(a) 
and 

= A + Bb+f(b) 
from which it follows that, 

b 
Hence F(x) has the form 



\ b — a J a — 



F(x)=f(x)-f(a) + 



(a - x). 



Thus we have succeeded in finding a function F(x) with the desired properties 
which satisfies Rolle's theorem. Differentiating F(x) we obtain 



F'(x) =/'(*) 



7(b) -/(a)" 



Now by Rolle's theorem there exists a point f, with a < f < b, such that 
F'(£) = and so we have our desired result 

b — a 

Since we may write f = a + d(b — a), where < 6 < 1, this result is 
sometimes expressed in the following form attributable to Cauchy, 

f(b) -f(a) = (b~ a)f'[a + 6(b - a)] with < 6 < 1. 

By applying the same arguments to a suitably constructed function 
<p(x), analogous to F(x), it is a simple matter to prove the following extension 
of the mean value theorem due to Cauchy. (See Problem 5-37.) 



208 / DIFFERENTIATION OF FUNCTIONS CH 5 

Corollary 5-12 If g'(x) = h'(x) at all points of [a, b], then g(x) = h(x) 
+ constant in [a, b]. 

Proof Setf = g — hm Theorem 5-12 applied to the interval [a, x]. Then 
g(x) — h(x) = g(a) — h(a) = constant and the result follows. 

theorem 5-13 (Cauchy extended mean value theorem) If f(x) and g(x) 
are real valued functions that are continuous in [a, b] and differentiable in 
(a, b) and g'(x) # in (a, b), then there exists a point f interior to {a, b) 
such that 

f(b)-f(a) _f® 

g(b) - g(a) g'it) 
5-3 (d) Indeterminate forms — L'Hospital's rule 
Limits such as lim (sin ax)/x which apparently tend to the form 0/0 have 

already been encountered and given meaning in special cases. A closely 
related problem is that of giving meaning to the limit of a quotient which 
apparently tends to oo/oo. These limit problems are both called indeterminate 
forms. One of the most obvious applications of the extended mean value 
theorem is to resolve the value of the limit in either of these situations, and 
we now prove the simplest statement of a useful result generally known as 
L'Hospital's Rule. 

theorem 5-14 (first form of L'Hospital's rule) If f(x) and g(x) are real 
valued differentiable functions at x = xo and, 

(a) f(x ) = g& = 0, 

(b) lim —^ = X, where X is either a real number or infinity, 

*^r <?'(*) 

thCn r fix) ... fix) . 
hm i— = hm J -— = I. 
*-*n g(x) x-»x g (x) 

Proof Apply the extended mean value theorem to the functions /(x) and 

g(x) denned on the interval [x, xo] and use condition (a) to obtain 

/(*) ^ fit) 

g<*) g'(0 

where x < £ < xo. 

Now x -»• xo implies that | ->• x , so that by condition (b) we have the 
desired result 

lim /w lim m. L 

x^xogW f-, g (?) 

The fact that the variable I appears in the second limit in place of the x 
stated in the theorem is unimportant. Its function is simply that of a variable 



SEC 5-3 



SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 209 



and the symbol used to denote it is immaterial. 

In general, when the symbol used to denote a variable is unimportant 
because it only appears in some intermediate calculation, the details of which 
do not concern us, we shall call it a dummy variable. 

A useful extension of L'Hospital's rule is contained in the following 
corollary which allows examination of limits which tend to the form oo/oo. 

Corollary 5-14 If <p(x) and xp(x) are real valued differentiable functions at 
x = xo and, 

(a) lim <p(x) ->■ ± oo, lim y>(x) -*■ ± oo, 

X-»X Q X- ».T 

(b) lim = X, where X is either a real number or infinity, 

x^x f (X) 

then 

hm — — = hm — — - = A. 

*-**o WW x^x f (X) 

Proof Apply the extended mean value theorem to the quotient qs(x)jxp{x) in 
the open interval (x, xi) with xo < x < xi, and write the result in the form 



<f>(x) 
W(x) 



1 - 



V(x\) 



f(x) 



1 - 



<p(xi) 
<p(x) 






where x < f < xi. Then, taking xi fixed and arbitrarily close to xo so that 
£ -»• xo, allow x ->■ xo. The first factor on the right-hand side then approaches 
arbitrarily close to unity thereby giving rise to the stated result. A modifica- 
tion of this argument shows that the result is also true if xo ->■ oo. 

Example 5-9 Determine the value of the following indeterminate forms 
using L'Hospital's rule and Corollary 5T4: 



sin ax 



(a) lim 

Z-.-0 X 

... ,. x* + 3x2 - 2x - 2 

(b) lim — ; 

x ^i 2x 2 — x — 1 

, . ,. sin 3x 

(c) hm — — ; 

x^O X 3 

tan 3x 



(d) lim 



X ->1„ tanx 



210 / DIFFERENTIATION OF FUNCTIONS CH 5 



(e) lim 



P 



-I) cot bx 

Solution (a) This is of the form lim//g— >-0/0 with /(.v) = sin x.v and 
g(x) = .y. As/'(.v) = a cos a.v and g'(x) = 1 it follows that 

sin a.v . a cos a.v 
lim = lim = a. 

:r— x J---0 1 

This confirms the limit that was obtained by a different method in Chapter 3. 

(b) This is also of the form \im f/g— ► 0/0 but this time with f(x) = x 3 
+ 3.Y 2 - 2.v - 2 and g(x) = 2.y 2 - .v - 1. Tt follows that f'(x) = 3x 2 
+ 6x — 2 and g '(.y) = 4.y — 1 so that 

.y 3 + 3.y 2 - 2.y - 2 ,. 3.y 2 + 6.y - 2 7 
lim = hm = — 

,_., 2.Y 2 - .Y - 1 ,_., 4.Y - 1 3 

(c) This is again of the form lim//g— >-0/0 with /(.y) = sin 3.y and 
g(x) = .y 3 . Hence /'(a) = 3 cos 3.y and g'{x) = 3.y 2 so that 

sin 3.y cos 3.y 
lim — = lim >- + cc. 

3-^0 x r _.o x- 

(d) This is of the form lim//g— >- oo/oo with f(x) — tan 3.y and ^(.v) 
= tan .y. Hence f'(x) = 3 sec 2 3.y and g'(x) = sec 2 .y and by Corollary 5T4, 

tan 3.y 3 sec 2 3.y cos 2 .v 

lim = lim = 3 lim 



tan x . r „w sec 2 .v . r -*j n cos 2 3.y 

This is again an indeterminate form, but now of the type 0/0. Applying 
Theorem 5T4 we have 

cos 2 .y .. 2 sin .y cos .y .. / sin x \ .. t cos .y \ 



cos-.y z sin .v cos x ,. / sin x \ ,. / 

3 lim = 3 lim — : — = lim .lim 

t-»jjt cos 2 3y r -.\i7 6 sin 3.y cos 3.y r —\n vsin 3.v/ ,• .<* \ 



^COS 3.Y/ 

and hence 

tan 3.y , . cos .y 

lim = — hm — — — • 

*_.}„ tan .y . r -s„ cos 3.y 

This last result is yet again an indeterminate form of the type 0/0 so that a 
further application of Theorem 5- 14 finally gives 

tan 3.y ,. sin x 1 

hm = - lim . = -• 

. T — s „ tan .y .r->j^ 3 sin 3.y 3 

(e) This is of the form lim f/g — > oc/oc but it is easily seen that an applica- 
tion of Corollary 514 will not simplify the limit to be evaluated. Instead, we 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 211 



rewrite the limit in the form 

a 

, .y / tan bx 

lim — = lima 

; ,.-.o cot bx .,-.(> .y 

when it is seen that the alternative form is of the type lim//g— >-0/0 with 
f(x) = a tan bx and g(x) = x. Now/' (a) = ab sec 2 bx and g'(x) = 1 so that 
by Theorem 514, 



;) 



,.. ab sec- x 

lim — = hm = ab. 

j--.o cot bx T ^ 1 

5-3 (e) Identification of extrema 

We return to the topic of extrema and, in particular, to the identification of 
functional behaviour at stationary values by means of the mean value 
theorem. 

Suppose that a real valued function f(x) is differentiable in the interval 
(a, b) and has a maximum at an interior point xo of (a, b). 

Then if h is assumed to be positive and we consider the interval 
[xo — h, xo] to the left of xo, by the mean value theorem 

/(a-o)-/(a-o-/Q 
h =/(a 

where xo — h < £ < xq. 

Now by supposition h > and as xo is a maximum, the numerator of 
this expression will also be positive showing that/'(f) > 0. Hence by allowing 
h to tend to zero, it follows that f -*■ xo and we have shown that to the immedi- 
ate left of the maximum we must have/' > 0. 

To the right of the maximum, and in the interval [.v , xo + h], the same 
argument shows that 

where ao < r\ < xo + h. This numerator is negative so that to the immediate 
right of the maximum we must have/' < 0. 

Similar arguments applied to a minimum and a point of inflection with 
a horizontal tangent yield the following useful theorem, illustrated in Fig. 51 1 . 

theorem 515 (identification of extrema using first derivative) If/(x) is a 
real valued differentiable function in the neighbourhood of a point A'o at 
which /'(.Yo) = then: 

(a) the function has a maximum at ao \ff'(x) > to the left of ao and 



212 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




f'<0 f>0 





(c) 

Fig. 5-11 Stationary values of y = fix): (a) local maximum; (b) local minimum; 
(c) point of inflection with zero gradient. 



f'(.x) < to the right of ,v ; 

(b) the function has a minimum at ,vo if /'(- Y ) < to the left of .\ and 
f'(x) > to the right of x ; 

(c) the function has a point of inflection with zero gradient at .yo if 
f'(x) has the same sign to the left and right of xo. 

In many books these results are regarded as intuitively obvious deductions 
from the geometrical interpretation of a derivative in conjunction with the 
behaviour of the graph of the function. However we have discussed them 
formally here as an illustration of an important consequence of the mean 
value theorem. 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 213 

Example 510 We again consider the functions of Example 5-8. 

Case (a) f(x) = |.r 3 + 2.v 2 + 3x + 1 with stationary points x = f at 
I = — 1 and f = — 3. Ksf'(x) = x 2 + 4.v + 3 it follows that to the immedi- 
ate left of | = — 1 we have /' < 0, whilst to the immediate right /' > 
showing that f = — 1 corresponds to a minimum. A similar argument shows 
that f = — 3 corresponds to a maximum. 

Case (b) /(.v) = (2.v — 5).v 2/3 with the one stationary point x = f at 
1=1. As f'(x) = 2.y 2/3 + 2(2* - 5)/3x 1/3 it follows that /' < to the 
immediate left of f = 1 and /' > to the immediate right. Hence f = 1 
corresponds to a minimum. As Theorem 515 stands, since/is not differenti- 
able at the origin, the maximum that occurs there must be identified as in 
Example 5-8. However a trivial modification of the proof would show that 
results (a) and (b) of the theorem are still valid if/ is not differentiable at .yo. 

5 3 (f) Differentials 

In using the notation dyjdx to represent the derivative of the dependent 
variable y with respect to x we have thus far been careful to emphasize that 
dj/d.v is simply a number defined by a limit. Although suggestive of incre- 
ments, dy and d.x taken separately have as yet no individual meaning. In 
many applications, particularly in differential equations which we encounter 
later, it is convenient to work with actual quantities dy and dx which we will 
call differentials. 

However differentials must obviously be defined in a manner consistent 
with the notation dyjdx when it is used to denote the derivative with respect 
to x of the function y defined by 

y =/(*)• (5-8) 

We achieve this by defining dy, the first-order differential of;-, by 

dj=/'(-v).A.Y, (5-9) 

where A.y is an increment in a' of arbitrary size. 

However, if, for the moment, we regard the independent variable .y as a 
function of .y we can write x = g(x) with g(x) = x. Then by the above 
argument d.Y, the first-order differential of x, is defined by 

dx = 1 . Ax, (5-10) 

showing that we may with meaning write Eqn (5-9) in the form 

dv=/'(Y)d.Y. (5-11) 

When needed, the actual increment in y consequent upon an increment 
A.y in x will be denoted by Ay. In general the differential dj and the increment 
Ay are distinct quantities and the interrelationship between them is indicated 



214 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




Fig. 5- 12 Differentials dx and Ay. 

in Fig. 5-12. 

In more advanced treatments the use of differentials is strictly avoided on 
account of logical difficulties encountered with their definition. However 
they are so useful that we shall ignore these objections and use them freely 
whenever necessary. 

It is an immediate consequence of this that if 

y = kif(x) + k 2 g(x) 

then by Theorem 5-4, 

dy = kif'(x)dx + k 2 g'(x)dx 

or, equivalently, in symbolic notation 

d(kif + k 2 g) = krff + k 2 dg. (5-12) 

If we have 

y =f(x)g(x) 

then by Theorem 5-5, 

ty = g(x)f(x)dx +f(x)g'(x)dx 
or, equivalently, in symbolic notation 

d(fgy = gdf + fdg. (5-13) 

Finally, if 

y=f(x)lg(x) 

then by Theorem 5-8, 

g(x)f'(x)dx -f(x)gXx)dx 



dy = 



g 2 (x) 



SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 215 

or, equivalently, in symbolic notation 

Jf\ = gdf-fdg (5 . 14) 



'I ) " 



Example 5-11 ]f/(x) = sin (x 2 + 4) and g(x) = x 3 find the differentials: 
(a)d(3/+£); 
(b) d(fg); 



(c)d 



©• 



Solution 

(a) d(3/+ g) = d[3 sin O 2 + 4) + x*] 

= 3 cos (jc 2 + 4)d(;c 2 + 4) + 3x 2 dx 
= 6x cos (a- 2 + 4)dx + 3a- 2 c1.y. 

(b) d(fg) = d[x* sin (x* + 4)] 

= 3x 2 sin (x 2 + 4)dx + x s cos (.y 2 + 4)d(.v 2 + 4) 
= 3x 2 sin (jc 2 + 4)dx + 2x i cos (.y 2 + 4)d.v. 



'©" 






"sin (jc 2 + 4)" 

x 3 cos O 2 + 4)d(jc 2 + 4) - 3.y 2 sin (.y 2 + 4)d.v 
x^ 

2x 2 cos (,y 2 + 4)d.Y - 3 sin (.y 2 + 4)d.v 



For small values of d.Y, the differential dv is obviously a reasonable 
approximation to the actual increment Ar. This simple observation is often 
utilized to relate small changes in dependent and independent variables as 
the next example shows. 

Example 512 The pressure/; of a polytropic gas is related to the density p 
by the expression 

P = A P r > 

where A is a constant. Deduce the relationship connecting the differentials 
dp and dp. Given that y = 3/2 and p = 4, and taking dp as an approximation 
to the actual pressure change A/?, compute the approximate new pressure if 
p is increased by 01. Compare the approximate and exact results. 



216 / DIFFERENTIATION OF FUNCTIONS CH 5 

Solution In this case p =f(p) with f(p) = Ap y . Hence f'(p) — yAp'~ l and 
thus the desired differential relation is 

dp = yAp'~ l dp. 

When y = 3/2 and p = 4 it follows from the stated pressure-density 
law that the initial pressure po is 

p = ifii^A = 8/(. 

Using the differential relation to compute the approximate pressure increase 
represented by the differential dp we find 

dp = QI2).A.W*.(0-l) = 0-3A. 

Hence the approximate new pressure po + dp = 8-3/1. 

The exact new pressure po + A/? may be computed from the pressure- 
density law by setting p = 4-1 to obtain 

po + &p = (4-1)3/2,4 = 8-308/4. 

This shows that in this case the differential relation gives a good approxima- 
tion to the pressure increase. 

5 -4 Higher derivatives — applications 

We have seen how differentiation applied to a suitable function/(,v) yields as 
a result another function /'(.v), the derivative of f(x) with respect to x. If 
the function f'(x) is itself differentiable then a repetition of differentiation 
will result in a further function that we shall denote by/"0) and will call the 
second derivative of f{x) with respect to x. We may usefully employ the 
dynamical problem that served to introduce the notion of a derivative to 
give meaning to the notion of a second derivative, for if fix) represents a 
velocity, then f"{x) represents an acceleration. If the function f'{x) is 
itself differentiable then it is customary to denote the third derivative off(x) 
by fix) after which, if necessary, further derivatives are conventionally 
denoted by the use of superscript roman numerals. Hence the sixth derivative 
of a suitably differentiable function /(x) would be written / vi (x). 

A better notation than this is needed for general purposes and the two 
most often used because of their versatility are 

d'H- 

-^ or D"y. 

d.v" 

These both represent the «th derivative with respect to x of y = f(x) and 

for their determination require the successive application of differentiation 

n times. The number n is the order of the derivative and the symbol D 

symbolizes the operation of differentiation. Computationally the definition 

of the- «th derivative of y with respect to x is equivalent to using either of 

these two equivalent algorithms 



SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 217 



d_ 
dx 



— -4 ) = T I or D[D»-iy) = D»y. (5-15) 

These expressions are, of course, only meaningful when n is an integer 
and we shall agree to the convention D°y = y. 

Geometrically, the function d n y\dx n bears to the graph of d n_1 j/dx"" 1 , 
the same relationship as does the function dy\dx to the graph of y. Namely 
d n y\dx n at x = xo is the gradient of the graph of d^-^/dx" -1 as a function 
of x at the same point x = xo- 

Example 5-13 Determine dy/dx, d 2 yldx 2 , and d 3 y/dx 3 given that y = f(x) 
with: 

(a) f(x) = cos mx; 

(b) f(x) = tan x; 

(c) f(x) = 1/(1 + x). 

If possible make deductions about the nth derivative. 

Solution 

dy d 

(a) — = f'(x) — — (cos mx) = — m sin mx, 
dx dx 



d*y d (dy\ d . 

-—z = — — I — — I = — — [— m sin mx] = — m l cos mx, 

dx 2 dx \dx/ dx 

d /d 2 v\ d 
= — • I — - 1 = -— [— m 2 cos mx] = m 3 sin mx. 
dx \dx 2 / dx 



d3y 
dx~ 3 



An inductive argument easily shows that the nth derivative (d n /dx")(cos mx) 
= m n cos [mx + (rmjl)]. 

In respect of the function y = cos mx, it is of importance to notice that 
the simple algebraic equation 

d 2 y 

connects the function and its second derivative. Because this equation 
involves derivatives it is a differential equation. Such equations are very 
important in both mathematics and the mathematical sciences ; the last three 
chapters of this book provide an introductory study of them. 

dv d 

(b) -^- ^f'(x) = — (tan x) = sec 2 x, 
dx dx 



d2y 
dx 2 



d (dy\ d 
= — I — = — (sec 2 x) 2= sec 2 x tan x, 
dx \dx/ dx 



218 / DIFFERENTIATION OF FUNCTIONS CH 5 

d 3 v d /d 2 v\ d „ 

d^ = ±x idi j = dx (2 Se ° 2 X tan X) = 2 S6C2 X(2 ta " 2 X + sec2 x) - 



There is no simple rule by which (d"/dx n )(tan x) may be computed. 

*? = n X ) = l /'-J— "i - - 1 

dx JK ' dx\l+x/ ""(1 + *) 2 ' 

d2 y ^ A (fy\ _ d r -1 

dx \dxf dx 



(c) 



dx 2 

d 
dx 

It follows by induction that 

d» / 1 \ (-!)»«! 



d*y _ d /d 2 j>\ _ 
dx 3 dx \dx 2 / 



(1 + x)\ 

2 



(1 + Jt)3 

-3! 



(1 + x)=>J (1 + *)* 



K n \ 1 + X/ 



dx" \1 + x/ (1 + x)" +1 

In general, functions are not capable of differentiation an indefinite 
number of times, and at some stage they usually become non-differentiable. 
A simple example of a function that is not differentiable an indefinite number 
of times, though for a different reason from the above, is x n , with n an integer. 
The «th derivative of x n is the constant number n\ so that the (« + l)th and 
all subsequent derivatives are identically zero. 



5-4 (a) Leibnitz's theorem 

This useful theorem is a consequence of Theorem 5-5 and facilitates the com- 
putation of high-order derivatives of the product f(x)g(x) of the two func- 
tions /(» and g(x), in terms of the derivatives of the individual functions 
f(x) and g(x) themselves. 

The result is, perhaps, best expressed in terms of the symbolic differentia- 
tion operator D, and for our starting point we now re-express the result of 
Theorem 5-5 in terms of the operator D. 

D(fg)=fDg+gDf. 

Assuming functions/(x) and g(x) are suitably differentiable, a further applica- 
tion of the operator D together with Theorem 5-5 yields 

DKfg) = D(fDg + gDf) 

= Df. Dg + fD*g +Dg.Df+ g D*f. 
However 

J s dx dx dxdx * J ' 



SEC 5-4 HIGHER DERIVATIVES - APPLICATIONS / 219 

so that 

DHfg)=fD*g + 2Df.Dg + gD2f. (5-16) 

A repetition of the same argument shows that 

DKfg) =fD 3 g + 3Df. D*g + 3Dy. Dg + gB*f. (5-17) 

The coefficients involved in Eqns (5-16) and (517) are seen to belong to 
the general pattern of binomial coefficients in the expansion of (a + b) n , 
namely to the rows of numbers 



(°o) 

(!) 



(« = 0) 

(«=i) 

*■-» (o) (?) (?) 



(« = 3) 



Q (?) (?) (?) 



or, equivalently, to the rows 

(» = 0) 1 

(« = 1) 1 1 

(« = 2) 1 2 1 

(« = 3) 1 3 3 



This suggests that in evaluating D n (fg), the coefficients arising should 
belong to the (n + l)th row of either of these arrays, which are Pascal 
triangles. That this is so can be proved fairly easily, using an inductive argu- 
ment similar to that used to prove the binomial theorem. We shall not give 
the details, preferring simply to state the theorem. 

theorem 5' 16 (Leibnitz's theorem) If/(x) andg(x) are n times differentiable 
real valued functions in the interval {a, b), then 

D'Kfg) = I (l) D»~*f. D*g. 

The value and power of this is best shown by an application. 

Example 5-14 Use Leibnitz's theorem to evaluate (d 3 /dje 3 )(x 6 sin x). 



220 / DIFFERENTIATION OF FUNCTIONS CH 5 

Solution Setting n = 3 in the general result gives 

DKfg) = gDJ+ 3D*f. Dg + 3Df. D*g + fD»g. 

This is, of course, result (517) differently expressed. Now we make the 
identifications /(.y) = .y 6 and g(x) = sin .v when it follows that Df = 6.Y 5 , 
D 2 f= 30.v 4 , Z) 3 / = 120.Y 3 , and Dg = cos x, D 2 g = - sin x, D 3 g = - cos x. 
Hence substitution into the above result gives 

Z) 3 (.v 6 sin .v) = 120x 3 sin x + 90.y 4 cos x — 18x 5 sin x — x 6 cos .y. 

5-4 (b) Identification of extrema by second derivatives 

An important application of the second derivative of a function /(.y) is to the 
identification of the nature of its extrema. Let us suppose that/(.Y) is twice 
differentiable and that/'(xo) = and/"Oo) = L < 0. 

Then from Definition 5-2 and the notion of a second derivative we must 
have that 

r(xo) = lim / ' ( " ) -^° ) = Z,<0. 

By supposition f'(x ) = 0, so that 

f'(x) 
f"(xo) = lim -J-L2- = L < 0. 

X-+I0 X Xo 

This limit must be independent of the manner in which .y approaches .Yo 
so that we must consider separately the cases that x lies to the left or to the 
right of Xo. 

If x lies to the left of xo then x — x < 0. Consequently, as the value L 
of the limit is negative, the expression defining f"{xo) implies that to the 
immediate left of xo it must be true that/'(.Y) > 0. 

If x lies to the right of xo then x — xo > 0. Consequently, as the value L 
of the limit is negative, the expression defining f"(xo) implies that to the 
immediate right of xo it must be true that/'(.Y) < 0. 

These results, in conjunction with Theorem 5-15 (a) prove that at a 
stationary value xo, for which/"(vo) < 0, the function/(.Y) attains a maximum 
value. An exactly similar argument proves that at a stationary value .yo, for 
which /"(-Yo) > 0, the function f(x) attains a minimum value. 

To complete the argument, consider the situation in which f"(xo) = 0. 
It might be conjectured that this corresponds to a point of inflection; and to 
establish the correctness of our intuition let us appeal to the geometrical 
interpretation of a derivative as a gradient. 

Suppose that .yo corresponds to a point of inflection with zero gradient. 
Then as .y increases through the value „vo, either 

(a) f'(x) is initially positive and decreases to a minimum value/'(.Yo) = 0, 
thereafter increasing again (cf. Fig. 51 1 (c)); 



SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 221 

or, 

(b) f'(.x) is initially negative and increases to a maximum value/'(-Yo) = 0, 
thereafter decreasing again. 

In each case .\o is a stationary value of the first derivative/'(.\), so that by 
an application of Theorem 5- 10 to the function /'(.v) we find that/"(.\o) = 
at a point of inflection. 

We have thus proved the following theorem. 

theorem 517 (identification of extrema using second derivatives) Let 
/(.y) be a real valued twice differentiate function in (a, b) with a stationary 
point xo in (a, b), so that/'(.vo) = 0. Then, if 

(a) f"(x ) < the function /(x) has a maximum at .vo, 

(b) /"(vo) > the function /(.v) has a minimum at .vo, 

(c) f"(xo) = the function f(x) has a point of inflection at .\o with zero 
gradient provided that the sign of/'(.v) is the same to the immediate 
left and right of xq. 

The proof of this theorem shows clearly what was asserted earlier; namely 
that a point of inflection on the graph of a function separates a region of 
convexity from a region of concavity. There is, of course, no necessity that 
this point should have associated with it a zero gradient. 

Following this argument to its logical conclusion we see that the proof of 
(c) above need only involve the sign off'(x) t0 tne left and right of .v when 
/'(-Xo) = 0, for then such arguments are needed to distinguish between an 
extremum and a point of inflection. If/'(.Yo) ^ such problems do not arise 
and it is sufficient to look for those values f for which /"(f) = 0. We have 
thus proved the following general result. 

theorem 5- 18 (location of points of inflection) If/(x) is a real valued 
twice differentiable function then its points of inflection, if any, occur at the 
numbers f for which /"(I) = provided that /'(£ ) ^ 0. Tf however this is 
not so, and /'(f) = 0, then f corresponds to a point of inflection provided 
that the sign of/'(x) is the same to the immediate left and right of f . 

It is left to the reader as an exercise to prove that when/'(.*o) = f"(xo) = 0, 
then provided /'"(xo) exists, our condition onf'(x) may be replaced by the 
requirement f'"(xo) =£ 0. The proof is essentially similar to that given for 
Theorem 5T7 though this time the starting point is the definition o[f"(xo) 
expressed as a limit. We give this result as a corollary. 

Corollary 5-18 Tf f(x) is a real valued thrice differentiable function and 
/'(f) =/"(£) = 0, then/(X) has a point of inflection at x = f if/'"(f) =£ 0. 

Example 5-15 Locate and identify the stationary values of the following 



222 / DIFFERENTIATION OF FUNCTIONS CH 5 

functions. Find any points of inflection they may have, together with the 
gradient of the tangent line at such points: 

(a) f(.\) = .v 3 - 12.v + 1 in [- 10, 10] ; 

(b) f(x) = tan x in [-fr, H; 
(c) /(.v) = (.v - 1)3 in (-oo, oc). 

Solution (a) The stationary values are those numbers I for which/'(£) = 0. 
Hence as f'(x) = 3.v 2 — 12, the stationary values are determined by the 
equation 

3f 2 -12 = 0. 

This has roots f = 2, f = —2 which both lie in [—10, 10] and are the desired 
stationary values. As/"(.v) = 6.v, it follows that/"(2) = 12 > and/"(-2) 
= — 12 < 0. Hence by Theorem 5-17, the point | = 2 is a minimum and the 
point | = —2 is a maximum. Since the function has no other stationary 
value there can be no point of inflection at which the tangent line has zero 
gradient. However f"(x) = 6.y vanishes when x — 0, so that by Theorem 
5-18 we see that .v = must correspond to a point of inflection. The gradient 
at .y = is/'(0) = — 12 which is the gradient of the desired tangent line to 
the graph at the point of inflection. 

(b) Here we have/'(Y) = sec 2 .v and clearly, since sec 2 .v = 1 + tan 2 x, 
it follows that/'(;v) ^ in [ — \tt, \tt\. The function /(*) = tan .y thus has no 
stationary values in [— J77, \v], though it assumes its greatest value at 477 
and its least value at — \tt. We have/"(.Y) = 2 sec 2 x tan x which vanishes 
for.v = 0. Hence by Theorem 5-18, the function tan .y has a point of inflection 
at the origin at which the gradient of the tangent to the graph has the value 
/'(0)=1. 

(c) We see that/'(.v) = 3(.y — l) 2 and so the condition /'(I) = yields 
£ = 1 as the single stationary value. However, f"(x) = 6(.y — 1) which shows 
that we also have/"(l) = 0. Appealing to the last part of Theorem 518 we 
see that, as f'(x) = 3(.y — l) 2 > to both the left and right of .y = 1, it 
follows that/(.v) = (.y — l) 3 has a point of inflection at that point. The tangent 
line to the graph there has a zero gradient. Alternatively, as/'"(.\)~ 6^0, 
the result also follows from Corollary 5-18. 

5-5 Partial differentiation 

The notion of continuity has already been extended so that it is meaningful 
in the context of functions of several independent variables. It is now appro- 
priate to extend the notion of a derivative in a similar fashion. For simplicity 
of argument we shall work with the function f(x, v) of two independent 
variables, and in order to visualize its behaviour geometrically we will define 
a dependent variable by the equation 

u=f(x,y). (5-18) 



SEC 5-5 



PARTIAL DIFFERENTIATION / 223 



The function may then be represented as a surface in three dimensional 
space. 

A typical surface generated by a function of the form of Eqn (5-18) is 
shown in Fig. 5-13 and, unlike functions of one independent variable, it is 
necessary to define more than one first-order derivative. The idea involved is 
simple: by holding one of the independent variables in /constant at some 
value of interest, the function/then becomes a function of the single remain- 
ing independent variable. We may then differentiate / as though it were a 
function only of that one variable. By holding first x and then y constant in 
this manner, two different derivatives may be defined which, because of their 
manner of computation, will be called partial derivatives to distinguish them 
from our earlier use of the term derivative. We shall now express these ideas 
formally as a definition and set down the standard notation to be used. 




Fig. 5- 13 Geometrical interpretation of partial derivatives. 



definition 5'5 (partial derivatives) Let f(x,y) be a function defined near 
(xo, Vo). Suppose that 



Jim 

X—Xq 



f(x,yo) -f(x ,yo) 



x — Xo 



(A) 



exists and is independent of the direction of approach of x to x . Then /is 
differentiable partially with respect to x at (.vo, J'o). The value of the limit is 



224 / DIFFERENTIATION OF FUNCTIONS CH 5 

denoted by f x (x ,yo) or by Sfl8x\ (rolm) and called the first-order partial 
derivative of/ with respect to x at (xq, yo). 

Similarly, suppose that 
lim /fa^)-/(*o..yo) 

v-vo y -jo v ' 

exists and is independent of the direction of approach of y to yo. Then / is 
differentiate partially with respect to y at (x , y ). The limit is denoted by 
fy(xo,}'o) or by 8fj8y\ {XoM) and called the first-order partial derivative of/ 
with respect to y at (xo, yo). 

By analogy with ordinary derivatives, if/(.v, y) is differentiable partially 
with respect to x and r at all points of some region in the (x, j)-plane and 
these derivatives are continuous, then we say/is differentiable in that region. 
The operations of partial differentiation with respect to x and y are usually 
denoted by the differentiation operators 8/8x and 8/8y, respectively. 

Let us now interpret these definitions in terms of Fig. 5-13. The function 
f(x, yo) occurring in the numerator of limit (A) in Definition 5-5 is represented 
in that figure by the intersection of the surface u =f(x,y) with the plane 
y — yo which has been labelled III. It is the curve L\. The number f x (x ,yo) 
defined by limit (A) is the gradient of the tangent line h to this curve at point 
P. By requiring the limit to be independent of the direction of approach of x 
to Xo, we have ensured that the tangent lines drawn to the curve at P, whether 
from the left or the right, will have the same gradient. In simpler terms this 
ensures that the curve L\ is smooth and has no kink at P. 

The number /(xo, y) occurring in the numerator of limit (B) in the defini- 
tion is represented in Fig. 5T3 by the intersection of the surface u = f(x,y) 
with the plane x = xo which has been labelled n 2 . It is the curve Lt. The 
number f y (x ,yo) defined by limit (B) is the gradient of the tangent line h 
to this curve at point P. 

Thus by differentiating partially we mean that, during the process of 
differentiation, the other independent variable is to be regarded as a constant. 
In consequence, all the rules of differentiation developed for functions of a 
single variable are also rules of partial differentiation, provided only that the 
functions involved are suitably differentiable. On account of this when, for 
example, the operator 8/8x acts on a function only of y, say g(y), that function 
is to be regarded as a constant with respect to this operator and so 
(8l8x)[g(y)] = 0. Similarly (8l8y)[h(x)} = 0. 

Example 5-16 In each of the following cases compute/: and/, as functions of 



SEC 5-5 PARTIAL DIFFERENTIATION / 225 

.y and y. Use the result to determine the numerical value of these derivatives 
at the stated points: 

(a)/(.Y,F) = * 3 + 2.Yj + 2r 2 ; (1,2); 
(b) f(x, y) = x sin xy + 3 ; (1,^); 

(c)f(x,y) = X Kx*+y*); (1,0). 



Solution 

(a) /* = £ W + 2xy + 2f] 



= ^-[x*]+2y^[x] + 2yZ^-[l), 

ex ox ox 



whence 
8f 



ox = 3x2 + ^ 



At the point (1, 2) we find that 8f/8x\ a :2) = 7. Similarly, 
fy = j [* 3 + 2xy + 2y*\ 



= x 3^-[l]+2x^{y] +2^[j2] 

8y By ' 8y y 



whence 



= 2x + 4y. 

ay 

At the point (1, 2) we find that d/]8y\ (lt . 2) = 10. 



8 
0) fx = — [x sin xy + 3] 

a as 

= x— [sin xj] + sin xy— [x] + — [3] 

8x ex ox 

whence 

3/ 

— = xy cos xy + sin xy. 



226 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



At the point (1, \n) we find that c//8x| (1>) = 1. Similarly, 



fy = ^-[x sin xy + 3] 

= x — [sin xy] + — [3] 
whence 



8v 



= x 2 cos xy 



and 



¥ 

^ 



= 0. 



(i» 



(c)/* 



oX 



_x 2 + y 2 



1 8 3 

x* + _y* ox 3x 



1 



x 2 + y 2 (x 2 + j 2 ) 2 dx 



[x 2 +y 2 ], 



1 



2x 2 



y 2 — x 2 



whence 

dx x 2 + j 2 (x 2 + y 2 ) 2 (x 2 + j 2 ) 2 
At the point (1, 0) we find that 8f/8x\ (lfi) = -1. Similarly, 
x 



fv — ~r 

dy 



x 2 + _y 2 



= x-[(x 2 +J 2 )" 1 ] 
-x 3 

7-[* 2 + ;> 2 ], 





(x 2 + >' 2 ) 2 8j 


whence 


8/ -2xy 


dy (x 2 + J 2 ) 2 


and so 


0/ 

0V 


= 0. 

n.o) 



SEC 5-5 PARTIAL DIFFERENTIATION / 227 

The notion of partial differentiation extends to functions of more than two 
independent variables in an obvious manner. Suppose that the function 
f(x, y, z) is defined near the point (xo, yo, zo) then, provided the limits exist, 
we define the three first-order partial derivatives/*,/^ and/ z by the expressions 



8 1 

8x 

8 1 

dy 

8 1 

8z 



, ■ fix, yo, zo) - f(x , yo, Zo) 
= urn ' 



(xo,yo,zo) %-*xo 



X — Xq 



, • /(xo, y, zo) - /(xo, yo, zo) 
= lim , 

(zo.i/o,zo) y^vo y~ y° 

, • fixo, yo, z) - f(x , yo, zo) 
= lim 

{xo,vo,zo) z ~* z o Z — Zo 



Clearly a function of n independent variables will have n different first- 
order partial derivatives ; one with respect to each of the independent variables. 
The actual computation of these partial derivatives is carried out exactly as 
before. 

Example 5-17 Find the first-order partial derivatives of 
f{x, y, z) = x 3 y 2 + 3 sin yz + 2. 

Solution This function has three independent variables so we must obtain 
three first-order partial derivatives. Namely,/;,/,, and/ z . First we have 

'J- = — [ X 3y2 + 3 s in y Z + 2] 
8x ox 

= j2 ^ [x 3 ]+3 sin 7 zl[l] + £[2], 



so 

»/ 



= 3x 2 v 2 . 

8x * 



Next, 



f- = \- [x 3 y 2 + 3 sin yz + 2] 

= ^ 3 | [ ^ + 4 [SinjZ]+ ^ [2] ' 



so 



8f 

— = 2x 3 y + 3z cos yz. 



228 / DIFFERENTIATION OF FUNCTIONS CH 5 



Finally, 



8f 8 

f z = y z lxY + 3 sin yz + 2] 

= X 3 y 2 m + 3 [sinjz] + -[2], 

dz oz oz 

so 

-=3ycosyz. 

5-6 Total differential 

The idea of a differential, that was useful in ordinary differentiation, may 
also be developed to advantage in connection with partial differentiation. 
We first approach this problem from the geometrical standpoint, and then 
indicate how an analytical counterpart of these arguments can be produced. 

Let us consider Eqn (5-18) and its geometrical representation in Fig. 513. 
The conditions for differentiability at P ensure that the surface has a tangent 
plane II at that point (why?), and it is to this plane that we now confine 
our attention. An element of this tangent plane defined by the lines h and h 
through P is depicted in Fig. 5-14. Obviously points on II close to P must also 
be close to those points on the surface u = f(x, y) that lie vertically below 
them. This suggests that for such points, the element of plane IT neighbouring 
P represents a good approximation to the element of the curved surface 
defining the function u near to P. Thus variations of u close to P may, with 
propriety, be approximated by the variations of the corresponding points on n . 

Since we are interested in variations of u about the point P at which 
u = /(xo, 70), we shall start by translating our coordinate axes without 
rotation to the point P. In this position the new x, y, and u coordinate axes 
will be denoted by x', y', and u', respectively, as shown in Fig. 5-15. 

If, relative to P, the x' and y' coordinates of a point P' are Ax and Ay, 
then it is obvious from Fig. 5-15 that the increment dw must be 

dw = Ax tan a + Ay tan ft, 

where a and /? are the angles between the lines h and h and the x'- and j'-axes, 
respectively. 

However, by the definition 0$ f x and/j,, we have 

fx(xo, yo) = tan a, f y (x , y ) = tan p, 
so that 

dw = fx(x , yo)Ax + f y (x , yo)Ay. (5-19) 

We now define differentials dx and dy in the independent variables x and y 



SEC 5-6 



TOTAL DIFFERENTIATION / 229 




Fig. 5.14 Tangent plane II to surface u = f(x, y) at point P. 
by setting dx = Ax and dy = Ay. Expression (5-19) then becomes 
dw = f x (xo, y )dx + f y (x , y )dy, 



(5-20) 



which is the relationship by which we define the total differential du of the 
function u =f(x,y). This is so called because it takes account of the total 
effect, on u, of the changes dx in x and dy in y. The additive effect of these 
changes is clearly apparent in Fig. 5-15 and results from using a tangent plane 
approximation to the surface near P. As before, when dx and dy are suitably 
small, du is a reasonable approximation to the true change Am given by 

Am =/(x + dx, jo + dy) —f(x ,yo). (5-21) 

An analytic rather than geometric justification of the tangent plane 

approximation used to define du in Eqn (5-20) can be based on Theorem 5-12. 
Equation (5-21), which is exact, is taken to be the starting point and by 

addition and subtraction of a term/(xo, yo + Ay), is written 

Aw = [/(x + Ax, y + Ay) -f(x ,yo + Ay)] 

+ lf(xo,yo + Ay) -f(x ,yo)], 
where the first bracket is a function only of x and the second bracket is a 
function only of y. 

Then Theorem 5-12 expressed in the Cauchy form may be applied to the 
first bracket with respect to x and to the second bracket with respect to y to 
yield 



230 / DIFFERENTIATION OF FUNCTIONS 



CH 5 




dxtana+Jytanfi 



Fig. 5.15 Element of tangent plane. 

Am = Axfxixo + f Ax, y + Ay) + Ayf v (x , yo + J? A_y), (5-22) 

where < | < 1 and < r\ < 1. Partial derivatives have been used here 
because, although in the first bracket it is only x that varies whilst in the second 
bracket it is only y that varies, both brackets are nevertheless functions of 
x and y. 

Result (5-20) then follows by letting Ax and Ay become small. The 
continuity of f x (xo + f Ax, yo + Ay) allows it to be approximated by 
fx(xo, yo) with an error ei and, similarly, the continuity of fy(xo, yo + f] Ay) 
allows it to be approximated by f y (xo,yo) with an error £2. Then, as Ax, 
Ay —*■ 0, so also do ei and £2. It is left as an exercise for the reader to supply 
the details necessary to make this argument rigorous. If Eqn (5-20) is defined 
for all points (x , yo) of some region in the (x, j)-plane, theh the suffix zero 
may be discarded and Eqn (5-20) can then be regarded as a functional rela- 
tionship rather than a result that is true only near one point. 

We have thus proved a special case of the following more general result 
whose proof differs in no significant detail. 



theorem 5-19 (total differential) Let/(xi, x 2 , . . ., x n ) be a real valued 
function of n real variables and let its first-order partial derivatives exist and 
be continuous in some region £%. Then the total differential du of the function 
u —f{xi, X2, ■ ■ ■, x n ) in the region £% is given by 



8f 8f 8f 

du = -i- dxi + -^- dx 2 + • • • + tt~ dx n . 

OXi 8X2 OXn 



SEC 5-6 TOTAL DIFFERENTIATION / 231 

If we consider the surface generated by setting u = constant, then on that 
surface du = 0. Theorem 519 then takes the form 

df df df 

= -f- dxi + -f- dx 2 + • • - + 7T- dx n , (5-23) 

OX i 0X2 OX n 

showing that the differentials dx\, dx2, ■ . ., dx„ are no longer independent 
since this constraint condition has been imposed on them. This is of course 
to be expected, since we have imposed the single condition /(jci, X2, . ■ ., x n ) 
= constant on the independent variables u\, U2, . . ., u n so that we are no 
longer free to change them arbitrarily. Indeed, if differentials d*i, d^2, . . ., 
dx„-i are chosen arbitrarily, then the remaining differential dx n is uniquely 
determined by Eqn (5-23). If we call the number of independent variables the 
number of degrees of freedom associated with the equation u =/(xi, X2, . . ., x n ), 
then Eqn (5-23) implies the loss of a single degree of freedom. 

Example 5-18 In thermodynamics, the pressure p of an ideal gas, its volume 
V, its absolute temperature T and the gas constant R are related by the ideal 
gas law pV = RT. Find the expression relating the total differential dp and 
the differentials dKand dT. 

Solution We have p = RT/V, and so p =f(T, V) with f(T, V) = RT/V. 
Hence Bf/dT = R/V and dfldV = -RT/V 2 . Now interpreting Theorem 5-19 
in this case we find 

d H19 dr+ (^) dF ' ( * } 

and so 

Notice that the use of the symbol /in the total differential relation (*) to 
bring it into accord with the notation of Theorem 5-19 is not strictly necessary 
since p =/. We could equally well have written equation (*) as 



HIW(I)-. 



and used the immediately obvious result that 

8p _ R dp _ RT 

8T~ V an d~V~ ~ ~V~ 2 ' 

Let us now consider the function u = f(x, y) and, as a special case, set 
u = so that the equation 



232 / DIFFERENTIATION OF FUNCTIONS CH 5 

defines y implicitly in terms of x. How then may we compute the derivative 
dy/dx without solving for y in terms of x? The solution to this problem is 
provided by Eqn (5-23), which in this case takes the form 

= ^dx + f-dj. 

ox cy 

We saw in connection with the definition of the differentials dy and dx in 
Eqn (5-11), that the function (dy/dx), called the derivative of y with respect 
to x, is the ratio d^ : dx of the differentials. Hence dividing by the differential 
dx, assuming that df/dy =£ 0, and rearranging gives the result 

dy = -(8f/8x) 
dx (df/dy) ' 

We state this as a corollary to Theorem 5-19. 

Corollary 5T9 (a) If the real variables x and y are related implicitly by the 
equation f(x, y) = 0, and the partial derivatives df/Bx and df/dy exist and 
are continuous, then 



- - ©/(£ 



dy 
dx 

whenever df/dy ^ 0. Insistence on this latter condition may be avoided by 
writing the result in the alternative form 



\8y! dx dx 



\8y 

The situation is slightly different if three variables x, y, z are involved and 
z, say, is defined implicitly in terms of the independent variables x and y by 
the equation 

f(x,y,z) = 0. 

In these circumstances it is frequently necessary to compute dzjdx and 
Bzjdy from this implicit relationship. To do so, notice that an obvious 
modification of Eqn (5-23) gives 

but if z could be obtained explicitly, so that z = z(x, y), it would also follow 
from Theorem 519 that 

, 8z 8z . 

dZ '- 8~x dX + Yy dy - 



SEC 5-6 TOTAL DIFFERENTIATION / 233 

Substitution of this result into the above expression gives 
(df 8f 8z\ J IBf Bf 8z\ , 

and as x and y are independent variables, dx and dy are arbitrary so that this 
expression can only be true if 

8f 8fdz df 8f8z 

8x 8z8x 8y 8z By 

Hence, we find that provided 8f/8z ^ 0, 

!=-(£)/© - %- (!)/(!)• 

We state this in the form of a further corollary. 

Corollary 5- 1 9 (b) If the real variables x, y, and z are related by the implicit 
equation f(x, y, z) — and the first-order derivatives of / exist and are 
continuous, then 

when Bfjdz ^ 0. 

Example 5 19 

(a) Find d//dx given that x 2 y + sin xy = 0. 

(b) Prove that (d/dx)(.xT) = rx r ~^ when r is rational. 

(c) Find dzjBx and 8z\8y given that f(x, y, z) = x 2 + 2xyz + z 3 . 

Solution (a) We must apply Corollary 5-19 (a). As, in this case, 

fix, y) = x 2 y + sin xy 

it follows that 

8f 

— = 2xy + y cos xy 



and 

x 2 + x cos xy 



8 l = ,z 



By 

Hence, by Corollary 5-19 (a), 

dy _ —j8f/8x) _ _ llxy + y cos xy\ 
dx~ (dfldy) ~ ~ [ x 2 + x cos xy ) 



234 / DIFFERENTIATION OF FUNCTIONS CH 5 

whenever x 2 + x cos xy ^ 0. 

(b) We have already shown in Theorem 5-2 that if y = x n , then dyjdx 
— nx 11 ' 1 for n a positive or negative integer. Now we must show this result 
is still true if the power involved is rational. 

Let j = x r with r = pjq, where p and q are integers without any common 
factor. Then j = x p,s implies, and is implied by, _y« = x p . Let f(x,y) = y Q — x p 
so that our equation corresponds to f(x, y) = 0. Then there clearly exist 
pairs of real numbers (x, y) for which yi = x?, and by Theorem 5-2, dfjdy 
= qyi- 1 =£ when y =£ (that is, when x =£ 0), and both dfjdy and dfjdx 
= — pxP~ l are continuous functions. Hence the conditions of Corollary 
5-19 (a) are satisfied so that by the second form of its statement we may write 

dy 
nyQ-l J- - px v-i = 0. 

dx 

Thus 

dv p xp- 1 p xp- 1 p , , 
dx q y^ 1 q O^ 7 *)?" 1 q 

when x ^ 0. In the event that x = we have 



— (XP'Q) 

dx 



, xVi - 
= hm » 



whenever this limit exists, which it does v/hen pjq > 1, and is then equal to 
zero. This establishes our desired result for all x. 
(c) Here, 

f(x, y, z) = x 2 + 2xyz + z 3 

and so 



8f 8f „ 
f- = 2x + lyz, f- = 2xz, 
dx dy 


df 

-f = 2xy + 3z2 

dz 


Thus by Corollary 5- 19(b), 




dz = ,2x + 2yz\ and 

dx \2xy + 3z 2 / 


dz —2xz 
dy 2xy + 3z 2 



5-7 Envelopes 

A simple and useful application of the total differential is to the problem of 
the determination of envelopes already touched upon in Section 2-5. Before 
proceeding with this application we now formally define an envelope. 

definition 5-6 Let a family of curves T in the (x, j)-plane with parameter 
a be defined by the implicit equation 



SEC 5-7 



ENVELOPES / 235 



/Or, }', a) = 0. 

Then the envelope of the family T, when it exists, is that curve £' which is 
tangent to every member of the family. 

Figure 5-16 (a) shows some representative members of the family V 
corresponding to values <xi, <X2, a 3 , and an of the parameter a. Figure 5-16 (b) 
shows the same situation on closely neighbouring curves Ci and C2 when the 
parametric value for C2 is ao + doc which differs only by the differential da 
from the parametric value ao appropriate to Ci. We shall assume that the 
curves Ci and C2 intersect at the point P with coordinates (xo, yo). 




Jlx,y,a,) = 



A 


y 

It 

*ifli§ili|ililliiiliti§> 




/t\x.y,a +da) = 


y» 


^srfrftx- y- a o) ^ C, 

1 k 





x Q X 



(a) 



(b) 



Fig. 5-16 Construction of envelope: (a) envelope of family of curves; (b) neigh- 
bouring members of the family. 

Setting u =f(x,y, a), and regarding x, y, and a as variables, it follows 
from Theorem 519 that 

8f , 8f 8f 

ox oy Oct 

and as the family is defined by setting u = (constant) it then follows, as in 
Eqn (5-23), that 

8f 8f 8f 

ox 8y dor. 

This equation which relates the differentials dx, dy, and da to the neigh- 
bouring curves Ci and C2 is, in particular, true at P. We signify this by 
writing 



\dxlp \eyl p ■ \8xJ p 



(5-24) 



where (-) p denotes that the associated quantity is to be evaluated at P. 
This equation is just the intersection condition for curves Ci and C2 at P. 
As it is required of the envelope S' that it be tangent to every member of 



236 / DIFFERENTIATION OF FUNCTIONS CH 5 

the family T it follows that as da -> 0, so curve Ci must tend to C2 and the 
gradient of the envelope «f at P must tend to the gradient of the tangent to 
Ci at P. To compute this we use the fact that a = ao is constant for curve Ci 
so that the argument that gave rise to Eqn (5-24), when applied to 
fix, y, ao) = gives the tangency condition 

-(£),"* + (i), d '- (525) 

Now both Eqns (5-24) and (5-25) must be simultaneously true for «? and, 
consequently, we arrive at the condition 



[da.) 



I) dK = °- 



' v 
which, since in general da is a non-zero differential, can only be true if 






= 0. (5-26) 



In addition to this result, the fact that P is a point on Ci implies that 
f(xo, Jo, ao) = or, equivalently that 

lf(x,y,aL)] p = 0. (5-27) 

Both conditions (5-26) and (5-27) must be satisfied if the envelope $ is to 
pass through P and be tangent to Ci at that point, so that dropping the suffix 
P, we see that $ is the locus of all points for which 

/(*,>>, a) = and —f(x,y,a) = 0. (5-28) 

Elimination of a between these two equations gives a relationship between 
x and y which is the desired equation of the envelope S. We have thus proved 
the following result. 



theorem 5-20 (envelopes) When it exists, the equation of the envelope 
$ of the family of curves 

f{x, y, a) = 

with parameter a is determined by the elimination of a between the equations 

fix, y, a) = and ^/( x > J- °0 = °- 

Example 5-20 Determine the envelope $ of the family of curves 
(x - a)2 + iy + a) 2 = 1, 



SEC 5-7 



ENVELOPES / 237 




Fig. 517 Envelope of circles. 



with parameter a. 



Solution If we write the equation of this family of curves in the form 

f(x,y,aL) = 0, 

then we must set 

f(x,y,H) = (x-*)* + (y + <*)*- 1. 

Hence the equation 8f/8a. = corresponds to 

-(x - a) + (y + a) = 

or, equivalently, to 

a = |(x — y). 

To determine the envelope, the conditions of Theorem 5-20 require that 
f(x, y, a) = simultaneously with dfjda. = 0. Hence substituting for the 
parameter a arrived at above from the condition df/dx = into the family 
of the curves f{x,y, a) = gives 



238 / DIFFERENTIATION OF FUNCTIONS CH 5 

1(* + J) 2 + l(x + y) 2 = 1 
or, 

x + y = ± y/2. 

The desired envelope £ thus comprises the two straight lines 
y = \J2 — x and _y = —-^2 — x. 

This result could also have been deduced by geometrical arguments as 
follows. The original family of curves comprise circles of unit radius, each 
with its centre at x = a, y = — a. Consequently, the tangents to these 
circles which form their envelope $ must be straight lines parallel to the line 
of centres j = — x and separated from it by a unit distance (Fig. 5-17). 

Although in this case it was possible to eliminate a from the equations 
arising from Theorem 5-20, this situation is not generally possible. In the 
next example we illustrate how on occasions a may be retained in a form 
which allows the equation of the envelope to be expressed in parametric form. 

Example 5-21 Find the envelope of the equation 

a 2 
(x - a) 2 + y 2 = t— — 2 > 
1 + a d 

where a is a parameter. 

Solution We again write the equation in the form 

fix, y, a) = 0, 

where this time 

tx 2 
fix,y, a) = (x - a) 2 + y 2 - 2 - 

Then 

8f „ x 2a 2a3 

J = -2(x - a) - — — + 



3a 1 + a 2 (1 + a 2 ) 2 

and hence the condition dfjda. = requires that 

(x — a) = 



(1 + a 2 ) 2 1 + a 2 

Now this is a specially simple situation because y is absent from the equation 
8J]8x = which allows us to solve immediately for x in terms of a to get 

, r 2 + a 2 1 

* = a3 L ( TT^# (A) 



SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 239 

To find the envelope g, Theorem 5-20 requires that in addition to satisfy- 
ing the condition Sf/da. = we must also require that f(x, y, a) = 0. 
Using the form of (x — a) given above this is easily seen to be equivalent to 
requiring that 

2 (X 2 

+ y 2 = 



.(1 + a 2 ) 2 (1 + a 2 )J J 1 + a' 



This may now be solved for y in terms of a to obtain 

±a 2 (3 + 3a 2 + a 4 ) 1/2 
y (1 + a 2 ) 2 ' ( ' 

The coordinates (x, y) of points on envelope g are thus determined in 
terms of a by equations (A) and (B). Although it is not possible to eliminate 
a between these equations to obtain an explicit representation for the envelope 
$ in terms of x and y, this is of no real importance as we have obtained the 
equations of £ in parametric form which are equally satisfactory. Different 
values of a will determine different points (x(a), j>(°0) on tne envelope <f . 

This example has in fact provided the detailed solution to the problem 
first studied in Section 2-5. Notice that for large values of a we have x — > <x 
and y—*- ±1, as was deduced from purely geometrical considerations when 
the problem was first examined. 

5-8 The chain rule and its consequences 

If, in Theorem 5-19, the variables jci, x%, . . ., x n are specified in terms of a 
parameter t, say, then the result requires slight modification. Suppose that 

Xl = Xl(t), X2 = X2(t), . . ., X n — X n (t), 

which are all differentiate functions of t. Then the variable u becomes a 
function of the single real variable t for we may write 

u = F(0, (5-29) 

where F(t) =f(xi(t), x 2 (t), . . ., x n (t)). 

Hence by an obvious adaptation of Eqn (5-11) defining differentials we 
may write 

d« = F'(t)dt, (5-30) 

where, of course, F'(t) = duj&t the derivative of u with respect to t. 

However by a further application of Eqn (5-11) to each of the variables 
xi = xi(t), X2 = xz(t), . . ., x n = x n (t) we have the result 

dx, -(£)*.*.-(£)„ «*-(£)*. ( ,3„ 



240 / DIFFERENTIATION OF FUNCTIONS CH 5 

Substituting these expressions for the differentials dx< in terms of the 
differential dt into the statement of Theorem 5- 19 gives 

/ 8f dxi 8f dx 2 8f dx n \ , 

d«=M + — - + • • • + — -\dt. (5-32) 

\Bxi dt 8x2 dt 8x n dt ) K ' 

Finally, a comparison of Eqns (5-30) and (5-32) shows that 

8xi dt 8x2 dt 8x„ dt 

As F'(t) = dujdt, this result facilitates the calculation of dujdt without the 
need for formal substitution into u=f(xi, X2, . . ., x n ) of the values 

Xl = Xi(t), X2 = X 2 (t), . . .,X n = X n (t). 

We have proved the following useful result. 

theorem 5-21 (chain rule for partial derivatives) Let u = f(xi, X2 x n ) 

be a real valued function of n real variables and let its first-order partial 
derivatives exist and be continuous. Further, let each of the variables x\, x%, 
. . ., x n be a differentiable function of the single real variable t so that we 
may write 

XI = Xi(t), X 2 = X2(t), . ■ ., Xn = X„(t). 

Then the total derivative of u with respect to t is given by 

d« 8f dx\ 8f d^2 8f dx n 

dt 8x\ dt 8^2 dt 8x n dt 

Two special cases of this theorem are of sufficient importance to merit 
recording as corollaries. The first arises when / is a function of only two 
variables between which an explicit relationship exists, and the parameter t is 
identified with one of these variables. 

As only two variables are involved we shall avoid the use of numerical 
suffixes by agreeing to write x\ = x and X2 = y where, by supposition, 
y = y{x) is some known explicit relation. The statement of Theorem 5-21 
then becomes 

d« 8f dx 8f dy 
dt ~ ~8xdl 8ydt' 

If, now, we identify t with x, then t = x and dx/dt = 1, dy/dt = dy/dx so 
that the above result becomes 

d« = e/; + a/;d7 

dx 8x 8y dx 

The expression on the right-hand side is the total derivative of u with respect 
to x. The first term on the right takes account of the change directly due to x 



SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 241 

whilst the second term takes account of the fact that y is itself a function of x. 
This result enables dw/dx to be obtained without needing to substitute 
y = y( x ) in the relation u = f(x,y). 

Corollary 5-21 (a) If u=f(x,y) is a real valued function of the real 
variables x and y with continuous first-order derivatives and y is related to 
x by the explicit equation y = y(x), then 

dw = S/; + fd£ 
dx 8x dy dx 

More generally, suppose that u = f(x, y) whilst x and y are related 
implicitly by the equation 

g(x,y) = 0. 

How must we modify our previous argument in order that we may compute 
the total derivative dw/d.v? The result of Corollary 5-20 (a) is still true but 
obviously dyjdx now depends on the form of g. To find the form of dy/dx we 
can use Corollary 5-19 (a), writing/ = g, to see that 



*y = _ ( d A\ l( 8 i\ 

dx \8x}/ \8y}' 

showing that 

du = 8f_/8f\/8g\//8g\ 
dx dx \8y)\8x)l \8yf 

provided 8g\dy ^ 0. We state this as our next result. 



Corollary 5 -2 1 (b) If u = f(x, y) is a real valued function of the real variables 
x and y with continuous first-order derivatives, and y is related implicitly to 
x by the equation g(x,y) = 0, then 

^ = 8 l-( d l\( d A\l( d J\ 

dx dx \8yj\8x)i \8y)' 
provided 8g\8y =fi 0. 



Example 5-22 Determine the derivative du/dt given that 
u = sin (x 2 + j 2 ) with x = 3t, y = 1/(1 + t 2 ). 

Solution We must apply Theorem 5-21 making the identifications xi = x, 
X2 = y, and/(x, y) — sin (x 2 + y 2 ) with x = 3t and y = 1/(1 + t 2 ). Hence 



242 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



— = 2.V COS (x 2 + v 2 ) -L 

8x K y ' dy 



= 2y cos (x 2 + j 2 ) 



whilst 

dx dy __ —It 

d7 ~~ ' d7 ~ (1 + ? 2 ) 2 ' 

Substituting in Theorem 5-21, 

du 

— = 2x cos (x 2 + y 2 ) . (3) + 2/ cos (x 2 + J 2 ) 



-2r 



.(1 + / 2 ) 2 J 



or 



du 
d7 



= 2 cos(x 2 + J 2 ) 



3x 



2^r 



(1 + Z 2 ) 2 . 



Using the known relationships between x, y, and t, the derivative' dujdt can 
thus be computed for any desired value of t. The details are left to the reader. 

Example 5-23 Determine the total derivative dujdx in each case : 

(a) u = x cos y + y cos x when y = 1 + x + x 3 ; 

(b) u = x 2 + 2xy — j 2 when x 2 + y 2 + cos xy = 0. 

Solution (a) This requires an application of Corollary 5-21 (a). We set 

f(x, y) = x cos y + y cos x and y = 1 + x + x 3 
so that 



8x 



and 



dy 
dx 



= cos y — y sin x, 



= 1 + 3x 2 . 



8f 

— = — xsiny + cos x 

dy 



Hence, substituting into Corollary 5-21 (a), 
du 



dx 



= cos y — y sin x + (cos x — x sin y)(l + 3x 2 ). 



(b) In this case we use Corollary 5-21 (b), with 
/(x, y) — x 2 + 2xy — y 2 and g(x, y) = x 2 + y 2 + cos xy. 
Hence 



8 1 

dx 



= 2x + 2y, 



8f 

dy 



= 2x — 2j, 



SEC 5-9 



CHANGE OF VARIABLE / 243 



8e 8g 

— = 2x — v sin xy — - 
dx y ' 8y 



= 2y — x sin xy. 



Finally, applying Corollary 5-21 (b), 



dx 



= 2(x+y)- 



2(x — y)(2x — y sin xy) 
(2y — x sin xy) 



5 - 9 Change of variable 

This section discusses a somewhat more complicated situation than that 
covered by Theorem 5-21, namely, the implications on partial differentiation 
of changing the independent variables in a function u =f(xi, X2, . ■ ., x n ) 
that is to be differentiated. This situation commonly occurs as a result of 
changing coordinate systems to suit physical problems as the following 
example illustrates. Suppose that p = p(x, y, z) is the pressure in a fluid 
flowing parallel to the z-axis. Then dpjdz is the pressure, gradient along-the 
direction of flow and Bpjdx, dpjdy are the transverse pressure gradients in the 
plane z = constant. 




» 



Fig. 5-18 Cylindrical polar coordinates. 



Now, if the flow takes place in a rectangular duct with sides described by 
x = constant, y = constant, then the Cartesian coordinates 0{x, y, z} are 
obviously the natural ones to use. However, if the flow takes place in a 
cylindrical pipe, then the z-axis is still convenient as it can be aligned with the 
axis of the pipe, but the x-, j-axes are now less useful since the wall of the 
pipe becomes the curve x 2 + y 2 = constant. Clearly, a more sensible coordi- 



244 / DIFFERENTIATION OF FUNCTIONS CH 5 

nate system would be the cylindrical polar coordinates r, 6, z' in which r and 
define a point in the plane z' = constant. Figure 5-18 illustrates this idea. 
Plane z = z' = in both the 0{x, y, z) and 0{r, 6, z'} systems of axes, and 
is denoted by IT. Relative to these two systems the point P has the coordi- 
nates 0{x, y, z} and 0{r, 6, z'}, respectively, where 

x = r cos 6, y = r sin 6, z = z'. (5-33) 

How can the pressure gradients described by the partial derivatives 
dp/dr, dpjdd, and dpjdz' be determined from Eqn (5-33), and the known 
functions dpjdx, dpjdy, and dpjdz. The rest of this section is devoted to solving 
this type of problem. Notice that from the definition of partial differentiation, 
dpjdz and dpjdz' have essentially the same meaning, whereas dpjdr is the 
derivative of p computed along a radius with 6 and z' held constant, whilst 
dp/dd is the derivative of p tangential to a circle r = constant drawn on the 
plane z' = constant. 

Although the replacement of coordinate variables in this manner involves 
replacing a set of n independent variables by a new set also comprising n in 
number (n = 3 above), we shall first prove a more general result. Specifically, 
consider the implication of the situation in which 



u = f(xi, x 2 , . . ., x n ), (5-34) 



when the independent variables x\, xi, . . ., x n are themselves differentiable 
functions of another set of variables which we denote by oci, 1x2, . . -, «m- 
It is not necessary that m should equal n. Thus we have 



Xl = Xi(<Xl, 0C2, . . ., oc TO ), 

X2 — *2(<Xl, 1X2, • • •> «m), f5-35 , \ 

Xn = X n (tX-l, <X2, . . ., <*m), 



If the variables xi in Eqn (5-34) were to be replaced by the equivalent functions 
(5-35) involving the variables an, then / would become some function 
F(xi, (X2, . . ., ocm) of ai, 0C2, . . ., oL m so that by Theorem 5- 19 we could write 



8F dF dF 

d« = — dai + — da 2 + • • • + — dam. (5-36) 

oai 0OC2 cam 



CHANGE OF VARIABLE / 245 

Next, observe that by applying this same theorem to the equation for 
x t in Eqn (5-35) we obtain 

a dx * j 8x i , Sx t 

dx^-da^-da, + •••+_ do,, (5-37) 

for i=l,2,. . ., n. 

Substituting these expressions into the statement of Theorem 5- 19 then 

gives 

H„_ 8 f \ 8x ^ A . 8x i, , 8 Xl I 

+ ^fc dai + ^ da2 + --- + a^H- (5-38) 

On re-arrangement this becomes 

d „=r^!^+^^! + ... , ¥^„i ■ 

L&a a«i ^ dxz aai + + a^^Tj dai + * * • 






da m . (5-39) 



Since /(* lf * 2 , . . ., Xn ) = F ^ u X2 ^ it fo j lows fe a direct 

comparison of the fth terms of Eqns (5-36) and (5-39) that 

j? = 8/^8x1 ,£f_Sxs 8/ 8x« 

8x t dxi Son "*" 8x2, Son + ' ' ' + Bx n sTt ( 540 > 

for i = 1, 2, . . ., w. 

We state this result in the form of a general theorem. 

theorem 5-22 (change of variable) Let/(*x, *,, . . ., Xn) be a real valued 

function of the real variables x u x 2 Xn whose first-order derivatives exist 

andare continuous. Further, let*! = xfa, « 8> . . ., ^ Xz = ^^ ^ . ; ^ 
. . .,x n - x„(ai, « 2 , . . ., a. m ) be differentiable functions of the real variables 
<*i, a2, . . ., a m , then 

3ai 8x! Sai 8x2 8ai ' Sx„ Sax 



246 / DIFFERENTIATION OF FUNCTIONS CH 5 



8/ _ 8f 8xi 8f 8x2 8f_ &*» 

8(X2 8X1 8<X2 8x2 80L2 8x n 8x2 



8f _ 8^8xi 8f_8x2_ 8f 8x n 

8cn, m 8xi dx m 8x2 8a.m 8x n 8xm 



Example 5-24 Express df/dr, 8fj8d, and 8fj8z' in terms of df/dx, 8fj8y, and 
8fj8z given that x = r cos 6, y — r sin 6, z = z'. Find their values given that 

/(x, y, z) = x 2 + 3xy + y z + z 2 . 

Solution We must apply Theorem 5-22 with m = n = 3 by making the 
identifications xi = x, x% = y, xz = z and oci = r, «2 = 6, 0C3 = z'. Our 
first result is 

8r 8x 8r 8y 8r 8z 8r 

8f _ 8f 8x 8f 8y 8f 8z 
86 ~ '8x ~86 + '~8y lid* "SzW 

8f _ 8f 8x 8f 8y 8f 8z 
8? _ 8x ~8z' + ~8y 8? + ~8z ~8z~'' 

However, 

8x 8x . 8x 8y . 

— = cos 6, —- -r sin 6, — = 0, — = sin d, 
8r 80 8z 8r 

8y 8y 8x 8y „ 8z' 

■4 = r cos 0, — , = 0, —=■/• = 0, — = 1. 

86 8z' 8z' 8z' 8z 

Hence, substituting these values into the above transformation equations 
shows that 

8f 8f n 8f . n 

8f 8f . n 8f 

-4 = - — r sin 6 + -x- r cos 6, 

86 8x 8y 

8z' 8z 

Next, using the fact that/(x, y, z) = x 2 + 3xy + y 2 + z 2 we see that 



SEC 5-9 



CHANGE OF VARIABLE / 247 



! = 2* + 3 7) 



8f 

f = 3x + 2y, 

8y 



oz 



so that 

-^ = (2x + 3y) cos 6 + (3x + 2y) sin 0. 

However, as r 2 = jc 2 + y 2 and cos 6 = x/(x 2 + j 2 ) 1/2 , sin 6 = y/(x 2 + J 2 ) 1 ' 2 , 
this result simplifies to 

8f 2x 2 + 6xy + 2j; 2 
g;- ( X 2 + ^2)1/2 

A similar calculation shows that 

| = 3(, 2 -^ 2 ), | = 2, 

Consider the special case of Theorem 5-22 that results when m = n = 2, 
so that its statement becomes 



8f _j8f 8xi 8f 8x2 

8xi 8xi 8xi 8x2 8oli 



8f _ 8f 8xi 8f dx z 
8a.2 8xi 8x2 8x2 8x2 



(5-41) 



Now for any differentiable function/(*i, X2), once the variable change has 
been decided, these equations express the partial derivatives / Kl ,/ aa in terms 
offx ,f x which we suppose to be known. However, if 8f/8xi and 8fj8x2 are 
supposed known, then Eqns (5-41) can be regarded as simultaneous equations 
for/i ,f x% . Thus, provided the simultaneous equations can be solved, Eqns 
(5-41) may be regarded as describing a one-one transformation, or mapping, 
between partial derivatives of/ with respect to (xi, X2) and (ai, 0C2). 

It is easily seen that provided J(xi, X2) ¥^ 0, we have 



8f_ 
8x1 



( 



8f 8x2 
8x1 8x2 



8f 8x2 
8x2 8x1 



3(xi, x 2 ), 



8x2 \8xi 8x2 8x2 8x1/1 



(5-42) 



where 



8x\ 8x2 8x1 8x2 _ 
J(Xl, X2) = — — - ^— - 



8x1 8x2 
8x1 dxi 

8x1 8x2 
8x2 8x2 



(5-43) 



248 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



The expression J(xi, xi) is the Jacobian of the transformation and is 
usually written in the form of the functional determinant shown in Eqn 
(5-43). If the Jacobian vanishes at any point in the (on, a2)-space then at such 
points the transformation we are discussing obviously becomes invalid and is 
singular. This is because at such points there is no longer any relationship 
between partial derivatives in the two coordinate systems. In more advanced 
discussions, the Jacobian is shown to play a fundamental role in all matters 
relating to changes of variable. Sometimes, to emphasize the variables in- 
volved, in place of }(xi, X2) the alternative notation 8(xi, x 2 )/S(ai, <x 2 ) is 
used. This idea is readily extended to more than two independent variables as 
would be appropriate in Example 5-24, where three variables are involved. 

The non-vanishing of the Jacobian is thus seen to provide an essential 
condition for the partial derivatives of any differentiable function /, with 
respect to (xi, x%) and (ai, 0C2), to be interchangeable by virtue of Eqns (5-42). 

Example 5-25 Find the Jacobians of the following transformations and 
state where, if at all, they vanish : 

(a) x = r cos d, y = r sin 6 (polar coordinates); 

(b) x — u + v, y = u — v; 

(c) x = 3m 2 + v 2 , y = u + v. 



Solution 



(a) 5(x,y) = 



d(x, y) 
8(r, 6) 



cos a 
—/•sin 1 



sin 
/-cos 



= /-(cos 2 + sin 2 0) = r. 



Hence in the case of polar coordinates the Jacobian vanishes at r = 
(that is, at the origin) which is the only singular point of the transformation. 



(b) J(x,j) = 



Kx,y) 
8(u, v) 



1 1 
1 -1 



= -2. 



This Jacobian never vanishes so that the transformation is always permissible. 



(c) J(x, 7 ) = 



Kx, y) 

8(u, v) 



6u 

2v 



= 6w — 2v. 



The Jacobian vanishes when 1u = v, so that the transformation is invalid, or 
singular, at all points on that line in the (u, t>)-plane. 

5-10 Implicit functions 

We have already used implicit functions when discussing various consequences 
of total differentials, and will now examine these ideas more closely. Consider 
the equation f(x, y) — 0. Often the argument is used that from this implicit 
function of x and y we can, in principle, solve for y, and as y depends on x, 
we are entitled to express y in the explicit form y = <p(x). 



SEC 5-10 



IMPLICIT FUNCTIONS / 249 



Suppose that f(x, y) = x 2 + y 2 + 1 . Then no real values of x and y 
satisfy the implicit equation f(x, y) = 0, so certainly in this case one cannot 
solve for y. Thus a necessary condition that we may solve for y near to some 
point P with coordinates (xo, yo) is that there are real numbers xo, yo such 
that /(xo, jo) = 0. 

Now let u =f(x,y) be the graph of f(x,y), and assume that /a; and f y 
exist and are continuous so that the graph will be a smooth surface of the 
type shown in Fig. 5-19. Then/(x, y) = is the curve of the section of this 
surface by the plane u = 0. In general the curve of the section will be similar 
to the smooth curve L shown in the figure and can be described by an equa- 
tion of the form y = <p(x). This will obviously be the case provided firstly, 
that the surface u = f(x, y) and the plane u = intersect and secondly, that 
they are nowhere tangential. The curve L will be smooth, and the function 
cp(x) differentiable, because the assumed continuity of the derivatives f x and 
f y will ensure that the surface u = f(x, y) is itself smooth, and so will generate 
a smooth curve of section. This is, of course, the assertion made in Corollary 
5-19 (a). Let P be a representative point on L with coordinates (xo, jo) in the 
u = plane, and line / be drawn tangential to the surface u = f(x, y) at P 
in the plane x = xo. Then by Definition 5-5, the angle a between line / and 
the plane u = is such that tan <x = Sf/ 8y\ (xom) . 




Fig. 5- 19 The function y 
plane u = 0. 



<i>(x) defined by the intersection of u = f(x, y) and the 



250 / DIFFERENTIATION OF FUNCTIONS CH 5 

Hence the condition that the surface u =f(x,y) and the plane u = 
should not be tangential at P is seen to be f y (x , yo) ¥=0. Collecting our 
results we now formulate them as the following theorem. 

theorem 5-23 (implicit function theorem) Let/(x, y) be differentiable and 
have continuous first-order partial derivatives near to (xo.jo) at which 
f(xo, yo) = andf y (xo, yo) i z 0. Then, near (xo, yo), it is possible to solve the 
implicit equation f(x, y) = uniquely for y in the explicit form y = y(x), 
where y(x) is differentiable. That is, near to (xo, yo),f(x, (p(x)) = 0. 

Notice that this theorem is only of the existence type in that it ensures 
that an explicit representation y = <p(x) exists, but gives no information on 
how such a representation may be found in any specific case. 

As a corollary to this theorem, consider the relationship between the 
derivatives of a function and its inverse. Let F(x, y) = y — f(x), so that 
F(x, y) = implies the relationship y =f(x). Suppose that at some point 
(*o, yo) we have f'(x ) ^ and y = f(x ). Then, noticing that dFjdx 
= (8l8x)[-f(x)] = (dldx)[-f(x)) = -f'(x) and dF/8y = 1, it follows from 
Theorem 5-23 that close to (xo,yo) we may solve for x as a function of y to 
obtain an inverse function x = (p(y). That is, F(<piy), y) = y — f[f(y)] = CL 

Furthermore, applying Corollary 5- 19(a) to F(x, y) = and regarding 
y as the independent variable and x as the dependent variable, we have 

so that provided /'(Jc) ¥= 0, we have 

^=l//'(*) or cp'(y) =l/f'(x), 
which is the desired result. 



Corollary 5-23 Let y =f(x) be a real valued differentiable function of x 
close to some point (xo, yo) at which yo =f(xo). Let x = y(y) be the function 
inverse to it close to the same point (xo, yo) so that xo = <p(yo), and let 
/'(*o) ^ 0. Then close to (xo, yo), we have 

<p'(y) = !//'(*) 

or, equivalently, 



41 -/(£)• 



dy 

This corollary has two important applications which we mention next. The 



SEC 5-10 IMPLICIT FUNCTIONS / 251 

first application of Corollary 5-23 is to the differentiation of inverse circular 
functions. In Section 2-2, we agreed to write 

y = arc sin x when x = siny and — w/2 < y < tt/2. 

Now, 

d 

— (sin j;) = cos j> j= for — w/2 < J < W2; 

that is, for — 1 < x < 1 and so, by Corollary 5-23, 

^ = 1 l(te\ =J-= l = 1 

d* / \dyj cosy V0 - sin 2 y) V0 - * 2 ) 

The positive square root has been taken here because the principal branch of 
the function y = arc sin x is a monotonic increasing function of x in its 
domain of definition — 1 < x < 1 . By this same argument, the negative 
square root is taken when differentiating the principal branch of the function 
y = arc cos x which is a monotonic decreasing function of x in its domain 
of definition — 1 < x < 1 . Thus 

d 1 

— (arc sin x) = —- for — 1 < x < 1. 

ax v (1 — x 2 ) 

Similar arguments establish Table 5-2. In the entries for the derivatives 
of arc cosec and arc sec, the term \x\ has been introduced to take account of 
the two separate cases that need consideration when deriving these results; 
namely, when x > a and when x < —a. These same ideas will be encountered 
again in the next chapter in connection with Table 6-3, when they will be 
discussed in more detail. 

Table 5-2 Derivatives of inverse circular functions 



- (arc sin xja) = — — (arc cos xja) = ■ 



dx V(a 2 - x 2 ) dx K vV - x*) 

for —a < x < a for — a < x < a 

d a d — a 

■ (arc tan xja) = —— — - — (arc cosec xja) = ■ 



dx" a* + x 2 dx y 1*1 V(* 2 - a 2 ) 

for all x for | x \ > a 

d , , . a d , , , —a 

- (arc sec xja) = - — ■ — — — — (arc cot xja) ■■ 



dx \x\ V(* 2 - a 2 ) dx ' ' a 2 + x* 

for | x | > a for all x 

In Chapter 2 we saw that curves may be described parametrically thus: 



252 / DIFFERENTIATION OF FUNCTIONS CH 5 



x = X(t), y = Y(t), (5-44) 

where t is a parameter defined in some interval J. The question that now 
arises is how may we find dy/dx in terms of the functions X(t) and Y(t ). 

Let us suppose that X(t) and Y(t) are differentiable functions of t with 
continuous derivatives and that X'(t) ^ 0. Then by Theorem 5-23, we may 
solve x = X(t) in the form t =f(x), say, so that then y = Y[f(x)]. From 
Theorem 5-7 on the differentiation of composite functions we have 

dy d dYdf 

or, equivalently, 

dy dy dt 

dx = dt"d~x ( 5 ' 45) 

However, by Corollary 5-23, d//dx = l/(dx/df) so that 

dy dy /dx 



dx dt/dt (5 ' 46) 

Hence, like x and y, the derivative dy/dx is now also known parametrically 
in terms of t. 

This result is best remembered in symbolic operator form : 

d i d 

dx ~ (dxjdt) d7 (5-47) 

Higher order derivatives with respect to x may be found either by a 
repetition of the argument leading to Eqn (5-46), or by successive applications 
of Eqn (5-47). 

Thus, using Eqn (5-47), we have 

d2y = _d_ /dy\ = 1 fd /dy ldxY\ 
dx 2 dx \dx) ~ (dx/dt) Idt [dt I dt /J 

or, denoting differentiation with respect to t by a dot, 

d2y _ d_ /dj\ _ld /dy\ 
dx 2 dx \dx) x dt \dx/ 

Using the fact that dy/dx = y\x and performing the indicated differentia- 
tions gives 

d 2 j xy - xy 

d7 2 - —JT- (5 ' 48) 

It is recommended that the reader remembers the arguments leading to the 
operator rule (5-47) together with the rule itself, rather than remembering 



SEC 5-11 



HIGHER ORDER PARTIAL DERIVATIVES / 253 



results of the form (5-48). 

Example 5-26 If x = t + 2 sin t, y = cos t determfne dyjdx and d 2 j/dx 2 
and hence deduce their values when t = 0. 



Solution We have 
dx 



dy 



. =1+2 cos t, — = —sin t, 
dt dt 

so that by Eqn (5-46) 

dy _ dy jdx _ —sin t 
dx ~ dt I dt ~ 1 + 2 cos i 

When f = we have x = 0, y = 1 and 



dy 

d^ *=() 
Next, as 



—sin t 



1 + 2 cos f 



= 0. 



«=o 



d*y 
dx 2 



(djc/d/) df \dx/ 



we have 



1 



d 2 ); 

dx 2 ~ 1 + 2 cos t dt 



—sin 



.1+2 



in t "1 
1 cos fj 



Thus, performing the differentiation and simplifying, 
2 + cos ? 



<Py 
dx 2 

and so 

d 2 ^ 
dx 2 



z = 



.(1+2 cos 3 . 



' 2 + cos t 
.(1 + 2 cos r) 3 . 



«=0 



5-11 Higher order partial derivatives 

If the function f(x, y) is differentiate with continuous first-order derivatives 
fx and f y , then it can also happen that these partial derivatives which are 
functions of x and y are themselves differentiable. Thus we are led to consider 
the further partial derivatives 

Tx^Jy^k^^Yy^- 



These functions, when they exist, are second-order partial derivatives of/ 



254 / DIFFERENTIATION OF FUNCTIONS CH 5 

and are respectively denoted by 

8J 8 2 f 8J d*f 

8x? 8y8x 8x8~y &nd 8f 

Using an alternative notation we often write these same derivatives as 

fxx, fxy, fyx, and fyy. 

In this notation the first suffix signifies the partial derivative of/ that is to be 
differentiated partially with respect to the second suffix. The centre pair of 
derivatives are mixed second-order partial derivatives and it is conventional 
that the order of x and y in corresponding mixed derivatives in the two 
notations is interchanged. Thus we have, 



8 B 2 f 3 d 2 f 

Ty^ = Wx=^ bUt 8-xM = dy 



It is important to notice that the double operations of partial differentia- 
tion that lead to the mixed derivatives f xy and f yx are performed in different 
orders. Consequently we have no right to expect that the derivatives that 
result will be equal to one another. To emphasize this point we now write out 
in full the limiting operations involved in arriving a.tf X y and^: 



8 
fxv(.x , yo) = y [fx(x, y)] 



= lim - 



lim 

4—0 K Ul-*0 



(Zo.Vo) 

f(x + h,y + k) -f(x ,y + k) 



— lim 

A-K) 



h 
/(*o + h,y ) - f(x , jo)l 



and so, writing 

g(xo,yo, K k) =f(x + h,y + k) —f(x ,yo + k) -f(x + h,y ) 

+ f(xo,yo), 
we obtain the result 

fxy(x , yo) = lim lim — g(x , y , h, k), (5-49) 

k— 0/s-M) UK 

where the inner limit with respect to h is to be taken first. Exactly similar 
reasoning gives the corresponding result 

fvx(xo, yo) = lim lim — g(x , yo, h, k). (5-50) 

A-mt-m UK 

Here it is the inner limit with respect to k that is to be taken first. 

The double limits used in Eqns (5-49) and (5-50) are called iterated limits 



SEC 5 1 1 HIGHER ORDER PARTIAL DERIVATIVES / 255 

on account of the fact that they are taken sequentially so that their order is 
important. They are not to be confused with the simple double limit of 
Definition 3-8 into which questions of order do not enter. 

Let us now explore the consequence of requiring one of the mixed 
derivatives, sayf xy , to be continuous. This is, of course, the usual situation. 
Definitions 3-8 and 3-9 imply that if fxy is continuous at (xo, yo), then a limit 
L = fxy(xo, yo) exists with the property that 

L — \\mf xy (xo + h,y + k), (5-51) 

A—0 
ft->0 

where the question of the order in which the limits are to be taken does not 
occur. Hence, &sf xy (xo,yo) is also defined by Eqn (5-49) in which an iterated 
limit is involved, the equating of these two results implies that if f xy is con- 
tinuous, then the order of the iterated limits in Eqn (5-49) is immaterial. Thus, 
under the stated conditions, expressions (5-49) and (5-50) become identical 
and the continuity of f xy implies not only the existence of f yx , but also that 
fxy = fy X . This establishes our next result. 

theorem 5-24 (equality of mixed derivatives) Let f(x,y) be a real valued 
function of the real variables x, y, and \etf x ,f y ,f xy exist and be continuous 
in the neighbourhood of the point (xo, yo)- Thenf yx also exists at (xo, yo) and 



8J 



8xdy 



takvo) 8 y 8x 



(xo.vo) 



Still higher-order derivatives can be defined by an obvious extension of 
the notation. Thus, for a suitably differentiable function /we may define the 
third-order partial derivatives 

Jxx X , Jyyx, Jxyx, fyyy, etc. 

If the higher-order derivatives involved are continuous then, by an 
obvious extension of Theorem 5-24, the order of performing differentiations 
may be disregarded. In the case of the mixed third-order partial derivative 
f xyx this would imply that 



f -L 

Jxvx ~ 8x 



" 3 

?y {fx \ 



-(/»]-/* 



8 
8~y 

Hence, under these conditions, it is proper to extend the 8 notation by 
writing 

ay sy ey ey 

8x 3 8x8y 2 8x 2 8y 8y 3 ' 

Example 5-27 If f(x, y) = x 4 + 2x 2 y 2 + xy* find the second- and third- 
order partial derivatives of/. 



256 / DIFFERENTIATION OF FUNCTIONS CH 5 

Solution First-order derivatives : 

f x = 4x 3 + 4xy 2 + y\ f y = 4x 2 y + 4xy 3 . 

Second-order derivatives : 

/„ = 12x 2 + 4y\ f yy = 4x 2 + \2xy\ 

f*« = ■?-(/*) = Sxy + 4y3. 

This mixed derivative is continuous, and sof xy = f yx . As a check in this case 
we compute f yx directly: 

8 
fyx = — (fy) = ixy + 4y 3 . 

Third-order derivatives : 

8 

fxXX = 24*, fyyy = 24xy, f X yy = — (f x y) = 8* + 12j 2 , 



8y 



8 
fxxy = g~ (fxx) = 8j. 



The continuity of the third-order derivatives we have computed ensures the 
existence and equality of the other corresponding third-order derivatives that 
may be defined. Thus, for example, as f XX y = 8y is continuous, there is no 
need to compute f xyx> since it exists and is equal tof xxy . 

Example 5-28 Define the function /by the requirement 

I xyix - y ) e . the] _ ^ ^ 

f(x,y)=l * 2 +J 2 ^ y 

[0 if both x = and y = 

Deduce the value of each of the mixed derivatives at the origin. 

Solution We shall use definitions (5-49) and (5-50) for this purpose by setting 
xo = 0, jo = so that 

« « , ,, hk ( h2 ~ k2 ) 
Then, from Eqn (5-49), 

/^o,o>-, imI , m ±pJgi^5) 

1--0 h-^o hk\ h 2 + k 2 j 



PROBLEMS 257 



h*-k 2 (-k*\ 

= hm lim — — = lim -— - = — 1. 

t-M, A-.0 h* + k 2 *^o \ &* / 

However, because the order of the iterated limits are reversed in Eqn (5-50), 
the same argument also shows that 

. ,. A 2 -it 2 ,. (h 2 \ . 
M°> 0) = hm hm = hm - = 1. 

Thusf xy (0, 0) = — 1 whereas /^(O, 0) = 1. This occurs because the functions 
fxy andfyx are not continuous at (0, 0) as may be checked by direct calculation. 

PROBLEMS 

Section 5-1 

5-1 Give examples of four physical quantities that are essentially defined in 
terms of a derivative. 

5-2 Use Definitions 5-1 and 5-2 to prove that the following functions are differ- 
entiable in the stated intervals and to compute their derivatives. Evaluate 
these derivatives for the stated values: 

(a) /(x) = 3x* in [o, 3], find/'(2); 

(b) /(x) - 2x» + x + 1 in [-1, 4]) find/'(3); 
(c)/(x)=|x|in(0,co),nnd/'(l); 

(d) /(x) = | x | in (- oo, 0), fihd/'(-3); 
(e)/(x)=l/;cin[l,5],nnd/'(4); 
(f) f(x) = X 1 ' 4 in (0, oo), find/'(2). 

5-3 Prove that/0) = I x | is not differentiable at the origin. 

5-4 Consider the graph of f(x) = x 3 + x + 1. Let xi and x 2 be two points on 
the x-axis with the property that the gradient dy/dx of the curve y = /(x) at 
x = X2 is four times the gradient at x = xi. Derive the algebraic equation 
connecting xi and X2 and deduce that | xi \ > 1. 

5-5 Deduce the gradients of the functions /(x) to the immediate left and right of 
x = 1 given that: 

fx 3 + x + 1 for x > 1 

(a)/W= 5-x-x2forx<l; 



{x 3 — x + 3 for x > 1 
2x + l 



forx< 1. 



5-6 Prove that the function / defined by /(x) = x 2 sin (1/x) for x =£ and 
/(0) = is differentiable at the origin and find the value of its derivative 
there. 

5-7 Prove from first principles that d/dx(cos ax) = — a sin ax. 

5-8 At which points in the stated intervals, if any, are the following functions 
/(x) non-differentiable: 

(a) fix) = x + sin 2x for < x < *; 



258 / DIFFERENTIATION OF FUNCTIONS CH 5 

,^ rr s l X + !/* f0r X ^ °1 • 

(b) fix) = m the interval [-1,1]; 

(0 tor x = 0) 

, •> „ ^ (1 for x rational 1 

(c) /(x) = . . in the interval [0, 1]. 

10 for x irrationalj 

5-9 The function /(>:) is defined on the interval < x < 1 by the expression 

{sin 2x for < x < Jn- 
ax + b for \v < x <> 1. 

Deduce the values of a and b in order that the function should be continuous 
and have a continuous derivative at x = Jw. Interpret these conditions 
geometrically. 

5-10 Give an example of a continuous function / defined on the interval [0, 5], 
that is differentiable everywhere except at x = 1 at which point the left-hand 
derivative is 3 and the right-hand derivative is 5. That is to say, the tangent 
line to the graph drawn to the left of x = 1 has gradient 3 whilst the tangent 
line to the graph drawn to the right of x = 1 has gradient 5. 

Section 5-2 

5-11 By assuming Theorem 5-2 is also valid for rational n where necessary, find 
the derivatives of the following functions, stating at which points in their 
domains of definition, if any, they are non-differentiable: 

t \ rt \ i xX ' 3 + cos 3x > for * * °\ ■ u ■ i , 

(a)/(x)= } in the interval —$* <, x < v ; 

10, for x = 0J 

(b) f{x) = x sin 2x + x 5 ' 3 for -1 < x < 3; 

(c) f(x) = | cos x | for <, x <. it. 

5-12 Use Theorem 5-4 to give an inductive proof that, if ki, k% k n are con- 
stants and/i(jc),/ 2 (x), . . .,fn(x) are differentiable functions in the interval 
a <, x < b, then 

d * " 

-5- 2 ktftix) = J kifi'ix) ina<,x<b. 

ax i=l i = l 

5-13 Dififerentiate the following functions: 

(a) y — x 1 * 3 sin x; 

(b) y = (x 2 + 3x + 1)(1 + cos 2x); 

(c) j = sin 6x cos 2x; 

(d) y = (x 3 + 2x - 1) eos 3x. 

5-14 Differentiate the following functions by making a repeated application of 
Theorem 5-5: 

(a) y = (1 + x 2 ) sin 7x cos 4x; 

(b) y = (1 + 2x2 + X 4)S. 

(c) _v = cos 3 2x; 

(d) y == (1 + x 3 )2 sin 2 3x. 

5-15 Differentiate these composite functions: 
(a) y = (x 2 + 2x + 1) 3/2 ; 



PROBLEMS / 259 



(b) y = (a + bx 3 ) 1 ' 3 ; 

(c) j = (2+ 3sin2x) 5 ; 

(d) y = sin (1 + 2x 3 ); 

(e) y = sin [sin (1 + x 2 )]; 

(f) y = cos (1 + x 4 ) 1 ' 2 . 

516 Differentiate these quotients: 

(a) y = (x 2 + 3x+ 7)/(x 4 + 1); 
sin (1 + x 2 ) 



(b)y = 
(c)y = 
(d) y = 
(e)j- = 



x 4 + 2x 2 + 6' 

1 1_ . 

3 cos 3 x cos x' 
tan(l - x 2 + x 4 ) 
sin (x 2 + 1) ' 
1 + Vx 
1 - V* 

5-17 Differentiate these functions: 
1 
(1 -3cosx) 2 ' 
x 

tan (1 + x 2 + x 4 ) 



(b)y=* 

(C) y = ^~ sin (1 + x 2 ) 

(d) j = cosec 2 (l + 3x); 

sin x + 2 cos x 

(e) y = ^ 5 

17 sin x — 2 cos x 



<S)y = 



/3x - 1\ 



5-18 If the functions /i(x),/2(x),^i(x), and^Cx) are differentiable, show by direct 
expansion that this theorem is true: 



_d_ 
dx 



/i(x) / 2 (x) 
giix) g£x) 



£i(x) gz(x) 



+ 






Apply this result to differentiate the determinants: 



(a) 



x 2 x sin x 
cosx 1 



fl>) 



(1 + x 2 cos x) (2 - sin 2 x) 
(1 - x 2 cos x) (2 + sin 2 x) 



5-19 Suppose that the functions /«(*)> with i,j = 1, 2, or 3, are differentiable func- 
tions of x. Prove, by means of Problem 5-18, that 



dx 



/ii(x) /i 2 (x) fis(x) 

/2l(x) /22(X) /23(X) 
/3l(x) /32(X) /33(x) 



fn'ix) fW{x) fw'ix) 

/2l(x) /22(X) /23(X) 
/3l(x) /32(X) / 33 (X) 



260 / DIFFERENTIATION OF FUNCTIONS CH 5 



+ 



/uW /12O) fis(.x) 
/21'W ft&\x) fis'ix) 

f3l(x) foAx) foi(x) 



+ 



fn(x) /12W fia(x) 
f2i(x) f 2 z(x) f 23 (x) 
fai'ix) fn'(x) f 33 '(x) 



Section 5-3 

5-20 Use the intermediate value theorem to prove that if f(x) is continuous on 
[a, b], with f(a) and f(b) having opposite signs, then there must be at least 
one point x = I, with a < f < b, for which /(I) = 0. 

5-21 Why is it not possible to conclude from the intermediate value theorem that 
if/(x) = 1/(1 - I x |) for I x I ^ 1 and /(| 1 |) = 0-5 then 

(a) there is no point x = £ in the interval [0, 6] for which /(I) = 0; 

(b) yet there is a point x = rj in the interval [— 11, —2] for which /(»)) = 
—0-5 ? Identify the point on the jc-axis giving rise to this functional value. 

5-22 The function /(x) = \x 3 — x + 2 which is defined in the interval (—00, 00) 
has extrema at the points x = 1, x = — 1. Identify their nature by considering 
the behaviour of the function close to these points. Are they relative or 
absolute extrema ? 

5-23 By considering the behaviour of f(x) = sin £x cos \x in the neighbourhood 
of x — in, show that the function attains an absolute maximum at that point. 

5-24 By considering the behaviour of y = x 2 — 2x + 3 in the neighbourhood of 
x = 1, prove that this point gives rise to an absolute minimum of the 
function. Find its value. 

5-25 Find the critical points of the function f(x) = x 3 — x 2 — 4x + 4. Identify 
the nature of the extrema associated with them by considering the functional 
behaviour close to each of these points. 

5-26 Find the critical point of the function f(x) = (x — i)x 213 and identify its 
nature. Do the points x = — 1, x = correspond to extrema of the function 
and, if so, of what type are they? 

5-27 Find the critical points of the function f(x) = x 2 (3 — x) 2 . 

5-28 Identify the critical points and extrema of the function 

lx 2 -3x + 2 for < x < 2-5 

U 2 - 7x + 12 for 2-5 < x <. 5. 

5-29 Apply Rolle's theorem to the following functions where it is applicable, and 
hence determine at how many points in the stated intervals [a, b] the following 
functions satisfy the result of that theorem: 
(a) /(*) = x* - 1 in [-2, 2]; 

(b) f(x) = 1 + sin x in [— 2w, 3*-]; 

(c) f{x) = 1/(1 + I x |) in [-1,1]; 
(d)/w = (* 2 +3* + 2for-l<;^0 

U 2 - 3* + 2 for 0< x < 1. 

5-30 Give an example of a simple continuous function g(x) of the type illustrated 
in Fig. 5-9 (b) in which^'(f) = for some point in an interval [a, b], but to 
which Rolle's theorem is inapplicable because g(x) is non-differentiable at 
one point of that interval. 



PROBLEMS / 261 



5-31 Show that the conditions of the mean value theorem apply to /(x) = x + 
sin x for the interval [0, £w]. Find the value of I in the statement of the 
theorem. 

5-32 In the proof of Theorem 5-12 a function F(x) was constructed on the interval 
[a, b] which had the property that F(a) = F(b) = and, in addition, 
satisfied the other conditions of Rolle's theorem. Repeat the proof of Theorem 
5-12, but this time with the requirement that F(a) = F(b) = K, where K is 
an arbitrary non-zero constant. 

The following four problems illustrate how the mean value theorem may be 
used to estimate the behaviour of functions in closed intervals. 

5-33 Let/(x) be a differentiable function having a monotonic increasing derivative 
in the interval [a, b]. Then by writing the mean value theorem in the form 
f(b) = /(a) + (b - a)/'(S)> with a < f < b, prove that /(a) + (x - a)f\a) 
< f{x) < f(a) + (x - a)f'(b), for a < x < b. We shall agree to say that 
these inequalities define upper and lower estimates of f(x) in [a, b]. Show 
also that if /'(■*) is monotonic decreasing, then the inequalities must be 
reversed in the above expression. 

5-34 Apply the result of Problem 5-33 to the function /(x) = sin x in the interval 
[0, £tt] in order to prove that < sin xjx < 1 for < x < in. 

5-35 Apply the result of Problem 5-33 to the function /(x) = (1 + * 2 ) 3/2 in the 
interval [1, 2], thereby obtaining upper and lower estimates for it in that 
interval. 

5-36 If fix) = 1 + x + (1/5) sin 2 x, show that /'(*) is monotonic increasing in 
the interval [— Jw, &*]. Hence apply the result of Problem 5-33 to/0:) to 
obtain upper and lower estimates for /(x) in that interval. Evaluate the 
inequalities for x = and x = i^ and compare the estimates with the exact 
result. 

5-37 Let the functions /(x) and^(x) be continuous in [a, b] and differentiable in 
(a, b), with^'fx) non-zero in (a, b). Show that under these conditions Rolle's 
theorem may be applied to the function F(x) defined by F(x) = f(a)g (a) - f{b) 
g{a) + [g{a) -g(b)]f(x) - [/(a) - fQ>)]g (x), for a < x < b. Hence estab- 
lish the Cauchy extended mean value theorem. 

5-38 By repeatedly applying L'Hospital's rule where necessary, evaluate the 
following indeterminate forms of the type 0/0: 

. tan ax „, ,. xcosx-sinx 
(a) lim — — ; (b) lim —5 ; 

x->0 X x-»0 X 



tanx- sinx /JA ,.„ x 3 - 2x 2 - x + 2_ 

x — sin x 
x 2 — sin 2 x 



, s ,. tan x — sin x ... ,. 

(c) lim : ; (d) lim 



(e) lim 

3_o x' sin-= x 

5-39 Evaluate the following indeterminate forms which are of the type co/co: 
(a) lim (7r/x)/cot *x/2; (b) lim tan x/tan 5x; 

(c)iim 3x2 + x ~ 1 ; (d) Hm _£!*iL_ 

W JHT«, x 2 + 2 ' w ^o*-cotx 



262 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



5-40 Explain the fallacy in this argument. The limit 
x 2 + x sin x + sin x 



lim 

X— *-oo 



does not exist because, applying Corollary 5-14 to L'Hospital's rule gives 

,. x 2 + xsin x + s'mx ,. 2x + sin x + xcosx + cos x 
lim - = hm 



= lim 

X—*- oo 



1+1 cos x + 



z—*-co 

sin x + cos x 



2x 
= 1 + i lim cos x. 

X->co 



What is the true value of this limit? 



5-41 Indeterminate limits of the form co — oo, . oo can be reduced to the types 
0/0 or co/co by means of the following simple devices. If the limit is of the 
type . oo set limf(x) = and lim £•(.*:) — >- co, then 

x— *a x-*a 

lim [f(x)g(x)] = lim [f{x)K\jg{x)} (type 0/0) 

x-*a x-<-a 

= lim [g{x)l{\lf{x))} (type co/co). 

X-Hl 

If the limit is of the type co — co set lim/(x) = 0, \img{x) = 0, then 



lim 

x-*a 



"_1 1_" 



= lim 

x— >a 

= lim 

x-^a 



x—*a 

- g{x)-f{x) - 

_ ftogV) . 



}l(g(x) - flx)\ 
Apply these results to evaluate the following limits : 

(a) lim l-X- - -) ; (b) 

x ^o \sin x xj 



(type 0/0) 
(type co/co). 



limP-- , 5 V 

x-*3 \x — 3 x 2 — x — 6 J 



(c) lim (1 — cos x) cot x; 

x-*Q 

ttX 

(e) lim (1 — x) tan — ; 

x^l 2 



(d) lim xsin-; 

X— *-oo X 



(f) 



i im /_iL__^y 

x _j„ \^cot x 2 cos xj 



5-42 Verify the nature of the extrema in Problem 5-20 using the results of Theorem 
515. 

5-43 Verify the nature of the extrema in Problem 5-23 using the results of Theorem 
515. 

5-44 Apply to Problem 5-26 the modification to Theorem 5- 15 indicated at the 
end of Example 5-10 (b) to identify the behaviour of the function at the origin. 

5-45 Apply Theorem 5-15 to Problem 5-26 to identify the extrema occurring in 
the interval (0, 5]. 

5-46 If j = f(x), where/is a differentiable function, find the differential dy given 
that: 

(a) f(x) = x 6 + 3x 2 + x + 6; 
(b) /•(*) = * sin (* 2 + 1); 



PROBLEMS / 263 

(d) /(*) = (1 + X 2 ) 1 /*. 

5-47 Metals A and B have coefficients of linear expansion a and /S, respectively. 
That is to say, when the temperature changes by an amount t from the 
ambient value To, the linear dimensions of metal A change by a factor 
(1 + <*t), whilst those of metal B change by a factor (1 + /3/). Suppose that a 
block of metal A contains a cylindrical cavity of height Ho and radius R at 
temperature To which is empty apart from a cylinder of metal B which has 
height ho and radius ro at that same temperature. Obtain an approximate 
expression for the small volume change d V of the cavity between the cylinders 
consequent upon a small change of temperature dt. 

Section 5-4 

5-48 Compute the first and second derivatives of the functions /(x) listed below: 

(a) fix) = tan x; 

(b) f{x) = x 2 sin x; 

(c) f(x) = (1 + x)(3 sin x + cos 2x); 
id) fix) = (x» + I) 1 ' 2 ; 

(e) fix) = sin (1 + x 2 ); 

(f) /foe) -tan- 

5-49 Show that if fix) = |(3* 2 - 1), then 

(1 - x*)f"ix) - Ixf'ix) + 6fix) = 0. 

Equations of this type are called second order ordinary differential equations, 
and this one is a special case of Legendre's differential equation. 

5-50 If fix) = H5x 3 — 3x) and gix) = K3x 2 — 1). find the algebraic equation 
connecting f\x),g \x), and/(x). 

5-51 Show that the function fix) defined below is continuous and has a con- 
tinuous first derivative at x = 1, but that it has a discontinuous second de- 
rivative at that point : 

for x < 1 

for x > 1. 



= (x* + x* - x + 1 
' \2x 3 - x 2 + x 



5-52 Use Leibnitz's theorem to evaluate the third derivatives of the following 
functions: 

(a) f{x) = TTx'' (b) f(x) = ( * ? ~ 1} tan x ' 

(c) fix) = sin 2 x; (d) fix) = x 3 sec 2x. 

5-53 Apply Theorems 5-17 and 5-18 to locate and identify the extrema and points 
of inflection of the following functions, using your results to determine the 
gradients at the points of inflection : 

(a) fix) = 2x 3 + 3a: 2 - 12* + 5; 
(c) fix) = x \x - 12) 2 . 



264 / DIFFERENTIATION OF FUNCTIONS CH 5 



5-54 Use the mean value theorem to prove that if /(*) has a maximum at x = xo, 
then near to xo, f"{x) < 0. Show that if f(x) has a minimum at x = xo, then 
near to xo, /"(*) > 0. Hence show that these tests may be used to identify 
maxima and minima, even when/'(xo) does not exist. 

5-55 Apply the results of Problem 5-54 to prove that the function fix) = 
(3x — l)x 2 ' 3 has a maximum at the origin. 

5-56 Determine the values of a and b in order that f(x) — x 3 + ax 2 + bx + 1 
should have a point of inflection at x = 2 at which the gradient of the 
tangent to the graph is —3. 

Section 5-5 

5-57 Compute the derivatives/* and// given that: 
(a)/(x,j) = x 2 jy; 

(b) f(x, y) = 3x 2 y + (x + y) 2 x + 1 ; 

(c) f(x,y) = sin (.x*+y z ); 

(d) f{x, y) = x cos (1+ x 2 y 2 ). 

5-58 Given that 

fix, y) = x 3 + 3x 2 y + 4xy 2 + 2y 3 
prove that xf x + yfy = 3/. 
5-59 Compute the derivatives f x , fy, fz given that: 

(&)f(x,y,z) = x 2 yz + —j 

(b) /(x, y, z) = x cos yz + y cos xz + z cos xy; 

(c) f{x, y, z) = cos (x 2 + xy + yz). 
5-60 Show that if 

fix, y, Z) = ( x2 + yS + z 2)3/2 

then xfx + yfy + zf z = —If. 
5-61 Show that if 

fix,y,z) = x + j-l 

then/* +/„+/*= 1. 
5-62 Show that if 

f= ix - y)iy - z)iz - x) 
then/* + /*,+/* = 0. 

Section 5-6 
5-63 Find the total differential d« given that u = f(x,y, z), where: 

ia)fix,y,z) = ^- z +xyz; 

(b) fix x y,z) = x sin iy 2 + z 2 ); 

(c) fix, y, z) = (l -x 2 -y 2 - z 2 ) 3 ! 2 . 



PROBLEMS / 265 



5-64 The speed of wave propagation u in a transmission line with inductance L 
and capacitance C is given by the equation u = (LC)- 1 ' 2 . Relate the differ- 
ential du to the differentials dL and AC. How must dL and dC be related if 
u is to remain constant ? 

5-65 Apply the triangle inequality |a + 6|<|a| + |6|to establish that, if 

u =f(xi, X2 x n ) is differentiable with respect to each of its independent 

variables xi, x%, . . ., x„, then 



d«| < 



Sxi 



dxil + 



V 



8x2 



dx 2 | + 



«/ 



fan 



dx„ 



A triangle with sides of length a, b, c has area A = y/[s(s — a){s — b)(s — c)], 
where 2s = a + b + c is the perimeter. If s is kept constant, find the largest 
possible value that may be assumed by | dA |, the absolute value of the area 
differential dA, consequent upon changes in the differentials da, db, and dc. 
Apply the result to an equilateral triangle in which a = b = c = 4, when 
changes da = 001, db = 0-015, and dc = -0025 are made. 

5-66 Compute dyjdx from the following implicit relationships: 

(a) x 2 + y 2 = 4; 

(b) x sin xy = 1 ; 

(c) x 2 y + 2xy 2 + y 3 = 2. 

5-67 Compute dzjdx and Szjdy given that: 

(a) x 2 + y 2 + z 2 = 1 ; 

(b) xyz + sin xz 2 = 2; 

(c) x 2 - 2y 2 + 3z 2 - yz + y = 0; 

(d) x cosy + y cos z + z cos x = 1. 

Section 5-7 

5-68 Find the envelope of the family of curves with parameter a 
(x - a) 2 +y 2 = a 2 /2. 

5-69 Find the envelope of the family of curves with parameter a 

. 3 
y=txx + — 
2a 

5-70 When a particle is projected into the air with velocity V at an angle 6 to the 
horizontal then, neglecting air resistance, its height y when distant x from the 
point of projection is given by 

v = x tan d - „ T ,f* „ ■ 
y 2F 2 cos 2 

By regarding 9 as a parameter, show that the envelope of the family of 
trajectories for < 6 < n is a parabola, and find its equation. This is usually 
called the parabola of safety because no projectile can penetrate beyond it. 

5-71 Find the envelope of the family of curves with parameter a specified by 
(JC - a) 2 + (y + a) 2 - a 2 = 0. 

5-72 Show that the envelope of the family of curves with parameter a denned by 
x cos a + y sin a = 2 
is a circle. Find its centre and radius. Interpret this family geometrically. 



266 / DIFFERENTIATION OF FUNCTIONS 



CH 5 



Section 5-8 

5-73 Find du/dt given that: 

(a) u = xy + sin (x 2 + y 2 ) with x = It, y = (1 + t 2 ) 1 ' 2 ; 

(b) u = (1 + x 2 + y 2 ) 3 ' 2 with x = t(l+ t), y = t 3 ; 



(c) « = 



with x = 3 cos f, y = 3 sin t, z = f 2 . 



(x 2 + j- 2 ) 1 ' 2 

5-74 If u = x 2 — xy + y 3 , compute du/dt at points on the curve specified para- 
metrically by the equations x = 2t + 1, y = t 2 + t — 2. 

5-75 Prove that if u =f(2x 2 + y 2 ), where /is a differentiable function, then 

du Bit 

y- 2x — = 0. 

J Sx 8y 

[Hint: Set t = 2x 2 + y 2 .] 
5-76 If « =f(x,y), compute du/dx given that: 

(a) f{x, y) = (1 + xy + x 2 ) where y = tan (-) ; 

(b) f(x,y) = (1 + x 2 — y 2 ) 3 / 2 where y = cos 3x; 

(c) f(x, y) = x cos y + y cos x — 1 where _y = 1 + sin 2 x. 

5-77 If u = f(x,y) and g(x, y) = are differentiable functions, compute dw/dx 
given that: 

(a) f{x, y) = x 3 + 3xy + y 3 and g{x, y) = x cos _y 4- y cos x — 2; 

(b) f(x,y) = x 2 j 2 + sin xy and^(x, j) = x 2 — 2j 2 — 3. 

5-78 If u = x 2 — xy + y 2 , determine du/dx at points on the ellipse 2x 2 + 3y 2 = 1. 
Section 5-9 



P(r,0, V ) 




PROBLEMS / 267 



5-79 In spherical polar coordinates a point P is specified in space by giving the 
ordered number triple (r, <p, 0). Here r is the radial distance of P from the 
origin, <p is the azimuthal angle of P measured anti-clockwise from the 
x-axis in the (x, j)-plane, and is the acute angle between the radius vector 
drawn to P from the origin and the z-axis. (See Figure.) 

It is easily seen that : 

x = r sin 6 cos <p, 

y — r sin sin <p, 

z = r cos 0. 
Uf(x,y, z) is differentiable with respect to x, y, and z, express df/Sr, 8fj8d, 
and 8f\8<p in terms of 8f/8 x , dfjdy, and 8f/8z. Find their values given that 
fix, y, z) = x 2 + 2xy + yz + z 2 . 

5-80 Given that/Or, y, z) = x 2 + xy + sin yz, compute Bf/8 r , 8fj8d, and Bfjdz, 
where (r, 6, z') are the cylindrical polar coordinates corresponding to the 
point (x, y, z). 

5-81 The notion of a Jacobian extends to transformations involving more than two 
variables. If, in Theorem 5-22, m = n = 3, the Jacobian or functional 
determinant is 

Bxi dX2 8x3 



BJXI, X2, xj) 
9(«i, <*2, a 3 ) 



dai 8ai Sai 

8xi 8x2 8x3 
8a.2 8a.2 80.2 



8x1 8x2 8x3 
80.3 80.3 801.3 

Evaluate the Jacobian 8{x, y, z)\8{r, 6, z') for the transformation from 
Cartesian to cylindrical polar coordinates. 

5-82 Use the definition in Problem 5-81 to evaluate the Jacobian 8(x,y,z)j 
s i r , V, e ) for the transformation from Cartesian to spherical polar co- 
ordinates. 

5-83 Find the Jacobians of the following transformations, stating where, if at all, 
they vanish: 

(a) x = 2« + 3v + 1, y = 3m - 2v - 1; 

(b) x = u 2 - v 2 ,y = u 2 + v 2 ; 

(c) x = u 2 + 2uv + v 2 , y = u. 

5-84 Use Theorem 5-22 with n = 2, m = 3 to determine 8fj8u, 8f/8v, and 8fj8w, 
given that: 

f=x 2 + iy 2 

where x = u 2 + v + w and y = uvw. 

5-85 If u and v are functions of x and y which satisfy u 2 — v 2 + 2x + 3y = and 
uv + x - y = 0, find 8u\8x, 8u\8y, 8vj8x, and 8vj8y in terms of u and v. 

5-86 Prove that if z = /(«, v), where u = x + 3t, v = y — It, then 

— - -\—- 1 — 
8t 8x 8y 



268 / DIFFERENTIATION OF FUNCTIONS CH 5 



5-87 Show that if u = \\r n , where r 2 = x 2 + y"- + z 2 , then 

£!ff £!ff 82 " - "(" ~ *) 
8x 2 8y 2 8z 2 ~ r n+2 

5-88 Prove that if u = 2xy + xfiy/x), then 
Su du 

x Tx + yT y = u + lx y- 

Section 5- 10 

5-89 Which of the following implicit functions /(x, y) — may be solved explicitly 
for y in the neighbourhood of the stated points (xo, yo) : 

(a) f(x, y) = x 2 + y 3 + xy - 11 at (1, 2); 

(b) f(x,y) = (l-x 2 - y 2 y> 2 at (-1, 0); 

( c ) /(*>)>) = sin xy - l at (1, i^); 

(d) f{x, y) = y + sin xy - 2 at (i», 1) ? 

5-90 Compute dx/dy for each of the following relationships : 

(a) y = 1 -f x 2 + x sin x; 

(b) y = (1 - x + x 2 yi 2 ; 

(c) j = x + tan x. 

5-91 Differentiate these functions: 

(a) /(x) = x 2 arc sec (x/a); 

(b) fix) = (x 2 + x + l)/arc sin (x 2 - 2); 

(c) /(x) = (1 + x + ar.c cos 2x) 3 / 2 . 

5-92 Compute dy/dx and d 2 //dx 2 for each of the following parametrically defined 
curves: 

(a) x = t - 1, y = t 3 ; 

(b) x = cos 3 /, y — 2 sin 3 1 ; 

(c) , = arc cos _I_ , = arc sin ^^-^ 

(d) x = 2(cos t + t sin 0, y = 2(sin f — t cos f). 

5-93 Compute dy/dx and d 2 j/dx 2 at f = |w if x = / — sin t and j = 2(1 — cos /). 

5-94 Compute d 3 yjdx 3 when t = 1, given that x = 2f + 1, y = f(l + t 2 ). 

5-95 In Example 5-21, an envelope is specified in terms of a parameter a, and it 
comprises two curves corresponding to the + and — signs associated with y. 
Find the gradient of each of these curves at the origin (that is, corresponding 
to a = 0). 

Section 511 

5-96 Compute 8 2 z\8x 2 , 8 2 z\8x8y, 8 2 z\8y8x, and 8 2 z/8y 2 for each of the following 
functions and hence show that 8 2 zj8x8y = 8 2 z\8y8x : 

(a) z = (x 2 + y 2 yi 2 ; 

(b) z = x cos y + y cos x; 

(c) z = arc tan (y/x). 

5-97 Compute /c*(l, l),/ty(l, 1), and/^,(l, 1) given that 
fix,y) = U+x)Hl+y) 3 . 



PROBLEMS / 269 



Is 8 2 f/8x8y = 8 2 fj8y8x1 Give reasons for your answer. 
5-98 Given that 

f(x,y) = * 2 + j 2 

{ 1 for x = 0, y = 

compute 8 2 f\8x8y stating, with reasons, when it is equal to 8 2 fj8y8x. Is there 
any point at which this result is not true and, if so, what property of the 
function invalidates the result? [Hint: Consider limits taken along the line 
y = mx.] 

5 99 Show that if w = arc tan (x/y), then 8 2 w/8x 2 + 8 2 w\8y 2 + 8*w\8z 2 = 0. 

5-100 Given that V = arc tan 2xy\{x 2 — y % \ prove that 

8V 8V 8 2 V 8 2 V 

5-101 Compute 8 a zl8 x 8y 2 and 8 3 z/8x 2 8y given that z = x A y 2 + sin x 2 y. 



Exponential, hyperbolic, 
and logarithmic functions 



6 -1 The exponential function 

This chapter will be concerned primarily with the exponential function, 
first introduced in connection with limits in Section 3-3 and, thereafter, with 
a number of related functions. This time our approach will be to utilize both 
geometrical ideas and the elementary calculus to produce a more useful form 
of definition than that contained in Eqn (3-6). 

Let us seek a function E(x) equal to its own derivative and such that 
£(0) = 1. Specifically, we must solve the equation 

E'(x) = E(x) (6-1) 

which, because it involves the unknown function E{x) together with its 
derivative, is called a differential equation. This differential equation has the 
following simple geometrical interpretation : if the graph of the function E(x) 
is drawn, then the gradient of the graph at the point (x, E(x)) is equal to the 
functional value of E(x) itself. 

Perhaps it is worth remarking that Eqn (6-1), taken together with the 
condition E(0) = 1, immediately implies that E(x) is a convex function for 
x > 0. No deduction can yet be made about its behaviour for x < though, 
in fact, we shall shortly prove that E(x) is a convex function for all x. 

As on previous occasions, our desired result is soonest obtained by 
studying an artificial function. The reason for considering the precise form 
of function to be used will become apparent once the result has been obtained. 

Suppose, for the moment, that there is a unique function E{x) defined by 
our requirements, and consider the new function F(x), where 

F(x) = E(x)E(a - x). (6-2) 

Then, 

F'(x) = E(x) — [E(a - x)] + E(a - x) ~ [E(x)] " 

which, using the defining property (6T), becomes 

F'(x) = -E(x)E(a -x) + E(a - x)E(x) = 0. 

Consequently, F{x) = constant but, as F(0) = E(0)E(a) = E(a), it follows 
at once that F(x) = F(0) = E(a) for all x, and thus Eqn (6-2) takes the form 

E(x)E(a - x) = E(a). 



SEC 6-1 THE EXPONENTIAL FUNCTION / 271 

Alternatively, by replacing a by a + b and x by b this may be written 

E(a + b) = E(a)E(b). (6-3) 

Hence, if « is a positive integer, 

E(n) = E(n - 1)2(1) = E(n - 2)(£(1))2 = • • • = (£(1))». (6-4). 

If, now, we denote £(1) by the symbol e, then Eqn (64) is equivalent to 

E(n) = e». (6-5) 

The fact that £(0) = 1 taken together with Eqn (6-1) implies £(1) > 1, 
also implies, via Eqn (6-5), that lim e n -> oo. 

n-»oo 

Again, 

£(-«)£(«) = £(0) = 1, 
so that 

£(-») = -=rr-'~ — = e-». (6-6) 

v ' £(«) e» 

Now we must extend this notation to take account of rational and 
irrational x. Let us consider E(x) for rational x, so that x = pfq with p, q 
integers. Then, using Eqn (6-5), we may write 



['<£)]' -'®-«*-'- 



and so 

E (?\ = &«. (6-7) 

A similar argument using Eqn (6-6) shows that 

£ (^i\ = q-p'i. (6-8) 

Thus we have shown that for all rational x 

E(x) = e*. (6-9) 

To extend the definition of E(x) to all the real numbers x and not just to the 
rationals, it only remains to add that for any irrational number f , we define 
£(£) by the equation E(t-) = e { . 

Although the foregoing arguments have established the algebraic properties 
of E(x), they have still not provided a method of attributing an actual number 
to E(x) for any given value of x. Nor, indeed, are we certain that only one 
function E(x) exists that satisfies Eqn (61) and is such that £(0) = 1; that 
is to say, is E(x) unique? This question will be answered in the affirmative 



272 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

immediately following the next stage of our argument. 

We now seek a series solution to our function E(x) of the form 

y = 2a rX r (6-10) 

where, for simplicity, we have set y = E(x) so that Eqn (61) now becomes 
Ay 

with/(0) = 1. 

Assuming that this infinite series may be differentiated termwise, we have 

dv " 

/ = 2 ra r xr~\ 

ax r = 

so that substituting for y and dy/dx in Eqn (6- 11) yields 

00 00 

2 ra r x r ~ l = 2 a r x r 

r=0 r=0 

or, equivalently, 

J (r + l)a r +ix r = f a r x r . (6-12) 

r=0 r-0 

For this result to be unconditionally true for all x, as it must be to satisfy 
our definition of E(x), it follows that it must be an identity in x. This can 
only be possible if the coefficients of the corresponding powers of x on each 
side of Eqn (6-12) are identical. Hence, equating the coefficients of the 
general term involving x r , we find that 

(r + lK+i = a r (6-13) 

for r = 0, 1, 2 

As we require that j(0) = 1, it follows by setting x = in Eqn (6-10) 
that ao = 1. Using this result together with Eqn (6-13), which defines the 
coefficients a r recursively, it is easily seen that 

1 1 * 
ao = 1, ai = 1, a z = — , as = — , . . ., a r = — 

2! 3! r! 

Substitution of these coefficients into Eqn (6- 10) then shows that 

E(x) = l+x + - + - + ■■■ + - + ■■ ■ (6-14) 

whatever this expression may mean. 

We have already remarked that the sum of ah infinite series is to be 
interpreted as the limit of the partial sums of the series, so let us now consider 
the nth partial sum 



SEC 6-1 THE EXPONENTIAL FUNCTION / 273 



X 



■n-1 



5„=l + , + - + --- + (T - Tyi (6-15) 

of the function E(x). 

If x > then S n +\ - S n = x 1l jn\ > 0, so that {S n } is increasing. Is {S n } 
bounded ? Let R be an integer greater than 2x, then x\r < J for r > R, and so 



x r x x xx x x 



■R-l 



<77f-^,(i) 



,r-R+l 



r\ 12 R-l R r (R - 1)! 

Thus 

fl-l yF " _1 v r X R ~ l n ~ l 

r% r\ r = Rr\ {.R ~ l)!r = fl 

which shows that {S„} is bounded. Hence by the postulate of Section 3-2 it 
follows that lim S n exists, and we now define the sum of the infinite series 

n-»oo 

(6- 14) to be equal to the value of this limit. The infinite series (6- 14) is thus 
defined for all positive x. 

As we have agreed to write £(1) = e, it follows from Eqn (6-14), by 
setting x = 1, that 

e = 1 + 1+ 2l + 3l + ,, ' + ^ + '"' (616) 

which, to 15 decimal places, has the numerical value 

e = 2-718281828459045. 

A modified argument shows that E(x) is also defined for all negative x, 
so that taking account of Eqn (6-9) we have proved the following result: 

theorem 6-1 (exponential theorem) For all x it is true that if 

v l 

e= 2 -{ 

»=o n\ 

then 

CO yfl 

e*= 27 

Let us now dispel any lingering doubts there may be about the uniqueness 
of e*. Suppose there is a different solution z = E(x) of Eqn (6-1), with 
z(0) = 1. Then we must have 



274 / EXPONENTIAL, HYPERBOLA, AND LOGARITHMIC FUNCTIONS CH 6 

dz 

d"* = Z (6-17) 

and so, differencing Eqns (611) and (6- 17), it is easily shown that 

dw 

5J-». (6-18) 

where w = y — z. We also have w(0) = y(Q) — z(0) = 0. Now solving 
Eqn (6-18) by the same device as before, but this time setting 

w = f i b r x' r , (6-19) 

we arrive at the recurrence relation 

(/• + l)b r +i = b T , (6-20) 

for r = 0, 1, 2, . . ., which is strictly analogous to Eqn (6-13). 

However, setting x = in Eqn (6- 19) and using the condition w(0) = 
we find that bo = 0, and so it follows from Eqn (6-20) that all the coefficients 
b r are zero. Hence from Eqn (6- 19) we see that w(x) = 0, and thus y = z, 
showing that the function e* defined by Eqn (6- 14) is unique. 

Finally, it remains for us to establish the equivalence of the function 
E(x) defined by Eqn (3-6) and the one denoted by the same symbols in Eqn 
(6-14). We shall only give the details for positive x. Our best method is first 
to expand Eqn (3-6), obtaining 

Then, setting E n +i = [1 + (*/«)]", we rewrite the result in the form 

*«-'♦»+ *KKK)K)+-- 

+£K)H)-('-^)<-> 

Defining the number g(r, n) by 

*«4-3(-3-('-3- 

we next write Eqn (6-21) as 

E„+i = 1 + x + |^(1, n) + |jg(2, «) + ••• + ^g(n - 1, n). (6-22) 
Now the difference S n +i — E n +i is 



SEC 6-1 THE EXPONENTIAL FUNCTION / 275 



Sn+1 - E n+1 = - (1 - g(\, „)) + L (1 _ g(2, «)) + ••• 

which is obviously positive since < g(r, n) < 1. 
However, it is readily seen that for any given r 

limg(>,n) = 1, 

showing that 

lim (S n +i — E n +i) = 0. 

n-»a> 

From Theorem 3-1 (a) it then follows that 
lim E n +i = lim S n +i — e* 

n-*oo n-*-oo 

thereby establishing the equivalence of our two alternative definitions when 
x is positive. A similar argument also establishes the equivalence when x is 
negative. 

Having now achieved a working definition for E(x) we shall henceforth 
always denote this function, known as the exponential function, either by e x 
or by exp (x). 

It is worth formally recording the differentiability properties of this 
function t x . However, we first remark that if fix) — e» te ', where g(x) is a 
differentiable function of x, then, setting g(x) = u so that f(x) = e" and 
using the chain rule in the form displayed in Eqn (5-6), we find that 

d/" df dw 

Tx = Tu-Tx = eUg ' (x)==8 ' (x)egixK 

theorem 6.2 If f(x) = e"<*>, where g(x) is a differentiable function of x, 
then 

d 

— {e» ( *>} = g'(x)e<>< x) . 

In particular, if g(x) = a.x, where a is a constant, then, 

— (e ax ) = ct.e x . 
dx 

Let us now establish an important property of e x . Consider the quotient 

eP/xP, where p is any positive integer. Then from Eqn (6-14) it follows that, 

x 2 x? xv +1 

e* 2J p\ (p + 1)! x 

xP~ xv > (p + l)f 



276 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS 



CH 6 



Hence we have shown that 
x 



e* 
hm — > lim 



->■ 00. 



r -*oo*l> ^oo (p + 1)! 

We have proved the following result: 

theorem 6-3 The function e* increases more quickly than any positive 
power of x as x -»■ oo. 

We have already noted that lim e* -*■ oo, and as e* = 1/e - * it follows 

X— *oo 

that lim e* =0 or, equivalently, lim er x = 0. From Theorem 61 it follows 

ai-»-oo as-*oo 

that the function e* is everywhere positive and since, by virtue of its definition, 
its derivative is everywhere a strictly monotonic increasing function of x it 
must be a convex function. A graph of e* is shown in Fig. 61. 

These last properties are frequently of help when studying limiting prob- 
lems involving the exponential function, as illustrated in the following 
examples. 




-2 -1 

Fig. 6-1 The exponential function 



Example 6-1 Deduce the values of the following limits: 
3e* + jc 3 + 1 



(a) lim 7 , 

x~* oo +£?' "T" -* 

/m r 2e2* + x* + 2 . 
0»]™ 3 e3* + 7 ' 

(c) lim — 



SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 277 

Solution (a) We have 

3e* + x 3 + 1 3 + (x»/e*) + (1/e*) 
2e* + x 7 ~ 2 + (x 7 /^) 

and from Theorem 6-3 it then follows that all but the initial terms in numera- 
tor and denominator must vanish as x -> oo, so that 

,. 3e* + x 3 + 1 3 

hm — — = -• 

_■ 2e* + x 7 2 

(b) In this case we have 

2 e 2* + x 2 + 2 _ 2e-* + (x 2 /e 3 *) + (2/e 3 *) 
3c 3 * + 7 3 + (7/e 3 *) 

However, this time as x -*■ oo so all the numerator tends to zero whilst the 
denominator approaches the value 3. Hence we have 

,. 2e 2 * + .*2 + 2 n 

hm — = 0. 

«_„ 3e 3 * + 7 

(c) This limit involves an indeterminate form of the type 0/0, so we 
appeal to Theorem 5-14. Writing/(x) = e"* — e 6 * and g(x) = 2x we see that 
/(0)= <? (0) = 0,and 

x-*o g (x) x -*o 2 2 

Hence, by the conditions of Theorem 5- 14, 

Qttx _ e bx ae ax _ fcbz a — b 

lim — = lim = — - — 

x~*0 2x x-*Q 2 2 



6 - 2 Differentiation of functions involving the 
exponential function 

The exponential function occurs frequently in mathematics, and all of its 
differentiability properties follow from Theorem 6-2 combined with the 
fundamental differentiation theorems of Chapter 5. These results are straight- 
forward and are best illustrated by examples. The first example illustrates the 
ordinary differentiation of simple combinations of functions. 

Example 6-2 Differentiate the following functions /(x): 

(a) fix) = 2x2 + 3e2* ; (d) fix) = c«*/(l + e*) ; 

(b) fix) = x*&*\ (e) /(x) = sin (1 + e*). 

(c) f(x) = 2 exp (x 3 + 2x + 1); 



278 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

Solution 

(a) f{x) = ^ (2x 2 + 3e 2 *) = 4x + 3 4 (e 2 *) 

dx dx 

and so 

/'(x) = 4 + 6e 2 *. 

(b) /'(*) = e 3 * 4- (x 2 ) + x z 4 (e**) 

dx dx 

so that 

/'(*) = 2xe 3 * + 3x 2 e 3 *. 

(c) This is a more complicated example of a composite function or, more 
simply, of a function of a function. Set u = x 3 + 2x + 1 so that 

/(*) = 2e«. 
Then, by the chain rule, 

r(x) = v.*a. 

JW du dx 
but 

df d dw 

-^ = — (2e M ) = 2e" = 2 exp (x 3 + 2x + 1) and — = 3x 2 + 2 
du d« dx 

so that, finally, 

fix) = (6x 2 + 4) exp (x 3 + 2x + 1). 

(d) Writing /(x) in the form 
f(x) = e 2 *(l + e*)- 1 

we have 

/'(*) = (1 + e*)- 1 ^ (e 2 *) + e 2 * £ [(1 + e*)" 1 ] 



or 



w-otV^k 1 *^- 



To evaluate the last term set 1 + e* = u, so that we then need to evaluate 

dx\u/ 
which, by the chain rule, is 



SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 279 



_d_ 

dx 



C-)= ± (-)■-■ 

\u] du \uj dx 



However, dujdx = e* and (d/d«)(l/M) = — (1/w 2 ) = —1/(1 + e*) 2 , showing 
that 

d — e* 

[(1 + e*)-i] = 



dx ' J (1 + e*) 2 

Hence, combining our results, we find 
2t 2x c 3x 



f'(x) = 



(1 + e*) (1 + e*) 2 



(e) This is another composite function. Set u = 1 + e 2x , so that f(x) 
= sin u. Proceeding as before we then see that 

/'(*) = ~ ■ jp = 2e 2 * cos (1 + e 2 *). 

Higher order derivatives are defined, as usual, by repeating the differentia- 
tion process the requisite number of times. 

Example 6-3 Find/"(x), given that: 
(a)/(x) = x2e-2*; 
(b) /(*) = (x- l)e*. 

Solution (a) Proceeding as before we find that 

f'(x) = 2xe-2* - 2x 2 e~ 2x , 
and 

f"(x) = 2e- 2 * - 4xe- 2x - 4xe- 2 * + 4x 2 e-2*. 
Collecting terms we obtain 

f"(x) = 2(1 - Ax + 2x 2 )e-2* 

(b) f'(x) = e* + (x - \)e x = xe x 
so that 

f"(x) = e* + x&. 

Partial differentiation of functions involving the exponential function is 
also straightforward, as the following example indicates. 

Example 6-4 Determine yi, f y , andf xy , given that 

fix, y) = (x 2 + J 2 ) exp (*2 _ j2) . 



280 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 



Solution 

y x = lx exp (x 2 - y 2 ) + (x 2 + j 2 ) y x [exp (x 2 - y 2 )] 

= 2x exp (x 2 - y 2 ) + 2x(x 2 + j 2 ) exp (x 2 - y 2 ). 

Notice that dfjdx comprises the sum of everywhere continuous functions 
and so is itself everywhere continuous. 

j = 2y exp (x 2 - y 2 ) + (x 2 + y 2 ) y [exp (x 2 - /)] 

— 2y exp (x 2 — y 2 ) — 2y(x 2 + y 2 ) exp (x 2 — y 2 ). 

The partial derivative dfjdy is also seen to be everywhere continuous. Theorem 
5-24 now tells us that dfjdxdy = dfjdydx, so that we may differentiate either 
dfjdx or dfjdy to arrive at f xy . We choose to differentiate f x partially with 
respect to y. 

8 2 f 

= —4xy exp (x 2 — y 2 ) + 4xy exp (x 2 — y 2 ) 



dydx 

— 4xy(x 2 + y 2 ) exp (x 2 — y 2 ), 
whence 

* — = -4xj(x 2 + y 2 ) exp (x 2 - y 2 ). 



dxdy dydx 

As a final illustration, let us consider an application of Theorem 5-21 to 
the exponential function. 

Example 6-5 Find dfjdt, given that 
f(x, y) = xy exp (x 2 + 3/ + 1), 
with x = sin /, y = t z + 1. 

Solution Here we must use the chain rule formula for partial differentiation : 

6f = dJ_ dx df dy 
dt dx' dt dy' dt 

Now 

■£ = y exp (x 2 + 3j + 1) + xy — exp (x 2 + 3y + 1) 

= y exp (x 2 + 3j + 1) + 2x 2 7 exp (x 2 + 3y + 1), 
and thus 



SEC 6-3 THE LOGARITHMIC FUNCTION / 281 



%-=y(l + 2x 2 ) exp (x 2 + 3y + 1). 

Similarly, 

8f 8 

— — x exp (x 2 + 3>> + 1) + xy — exp (x 2 + 3y + 1) 

dy 8y 

= x exp (x 2 + 3y + 1) + 3xy exp (x 2 4- 3y + 1), 
and thus 

^ = x(l + 3y) exp (x 2 + 3y + 1). 

We also have 

dx dv 

— = cos f and -f- = 3? 2 , 
dt dt 

and so df/dt may now be found by direct substitution into the chain rule 
formula, with the following result: 

df 

-I = [(/3 + i)(! + 2 sin 2 1) cos t + 3f 2 (3? 3 + 4) sin t] 

dt 

X exp (4 + 3t 3 + sin 2 0- 

6-3 The logarithmic function 

Having introduced the exponential function there is now a need for an inverse 
function. The implicit function theorem (Theorem 5-23) tells us that such an 
inverse function exists and, furthermore, that it is differentiable whenever 
(d/dx)(e s ) # 0. However, this is always the case since we have already seen 
that (d/dxXe*) = e x , which is never zero for x in the interval — oo < x < oo. 
Hence a differentiable function, inverse to the exponential function, exists for 
all x. We call it the natural logarithmic function and denote it by log e whenever 
it is necessary to indicate that it has the base e. 

definition 6T We define the natural logarithmic function log e x by the 
requirement that 

y = loge x o x = e y . 

We may use this definition, together with Corollary 1 to the implicit 
function theorem, to compute the derivative of log e x. As dy/dx = l/(dx/dy) 
and x = e*', it follows that dx/dy = &>, whence 

dy_ j__ 1 
dx & x 



282 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 



Now e*' is essentially positive, so that 

d 1 

— (loge x) = - for x > 0. (6-23) 

It is obvious that log e 1 = and, as x increases strictly monotortically 
with y, it also follows that loge x— *■ + oo as x -> + oo, and log e x -*■ — oo as 
x-*0. 

Let us now prove that 

lim ^^ = for all a > 0. 

x->ca -* 

As x = e" we have 
]ogeX = J_ 

and so 

lim !°iii = lim Z _ I i im 21. 

x— >oo x y— *oo e * oc y—+oa e y 

Setting h = ay we arrive at 

,. log e X 1 U 

lim g = - lim — = 0, 

z—><X) % * «— >oo C 

by virtue of Theorem 6-3. 

Collecting the previous results we arrive at the following theorem. 



theorem 6-4 If j = loge x, then 

dy 1 

(a) -f = " for x > 0; 
dx x 

(b) lim ^iL? = for all a > 0. 

Logarithms to other bases can be used if convenient. They are defined as 
follows. 



definition 6-2 We define the logarithmic function to the base c, denoted 
by loge x where c is a positive number, by the requirement that 

y = loge x o x = c«. 

For reference purposes we record the following familiar properties of the 
logarithmic function, established in elementary courses. 



SEC 6-3 THE LOGARITHMIC FUNCTION / 283 

Basic properties of the logarithmic function 

Let loge and log c represent logarithms to the bases e and c respectively, and 
a, b, r be real numbers ; then : 

(a) loge ab = loge a + \og e b; 

(b) log e a r = r loge a; 

(C)1 ° gca = ioiel ; 

(d) log c e = 



logec 

Results (c) and (d) quoted above are immediately useful if it is necessary 
to differentiate loga x. For we have 
, loge x 

logo X = 

loge a 
so that 

al (,0&,x) = k^'dl (l0geX) 

whence, 

d ., . 1 logae 

— (log* x) = —. = -2— (6-24) 

ax x loge ax ' 

Let us now find the derivative of the function a x , where a is any positive 
number. Notice first that, by. virtue of Definition 6-1, 



so that 

a x — (^og e a\x _ gSlogea^ 

Now loge a is simply a constant, so we have 

(a*) = — (e x loge •) = loge a e* loge a = a* loge a. 

We have thus established the useful result 

d 

— (a*) = a* log e a. (6.25) 

This result can also be obtained in another manner. We set 
so that taking the natural logarithm gives 



284 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

loge y = X loge a. 
Differentiating this result with respect to x we obtain 

^(logej) = ^(*log e <0 

or 

1 Ay 

- ■ -f = loge a, 
y dx 

and so 

dv d 

— = — (a x ) = y loge a = a x log e a. 

For our final general result we consider the differentiation of the function 
y = log e g(x), where g(x) is a differentiable function. Setting u = g(x) so 
that y = loge u and using the chain rule gives 

dy = dy du = l_ 
dx du dx u 

so that, finally, 

|;[.og. ? WJ=||- (6-26) 

Henceforth, unless otherwise stated, the natural logarithm will always be 
used, so for simplicity of notation we shall write log in place of log e . Often, in 
other texts, the notation In is used to denote the natural logarithmic function. 

Let us now examine some representative cases of limits involving 
logarithms. 



Example 6-6 Evaluate the following limits : 

(a) lim fc 3 ; 

x— *oo X 

log a x 

(b) lim b with a > 0; 

frtlim l+* 3 log P + (!/*)] . 
log (1 + 3*) 

(d) lim — t; — • 

x-*0 *-X 



SEC 6-3 THE LOGARITHMIC FUNCTION / 285 

Solution (a) We have 

log x 3 3 log x 
x x 

so that by Theorem 64 (b) it follows at once that 
lim fe- = 0. 

X— *oo X 

(b) We have 

log a x x log a 
3x + 1 - 3x + 1 

and so 

li m lo 8 a:r _ lim * lo g a = i. , 

X— >oo -JX "I 1 a:—* oo ^-^ i A J 

(c) Using the result 

1 + x 3 log [2 + (1/x)] _ (1/x 3 ) + log [2 + (1/x)] 
3x 3 + 2x 2 + 1 3 + (2/x) + (1/x 3 ) 

it is at once apparent that 

lim l + *3 log[2 + (1/x)]=1 

3x 3 + 2x 2 + 1 3 B 

(d) This is an indeterminate form of the type 0/0. It is easily verified that 
Theorem 514 (L'Hospital's rule) is applicable so that 

x-.0g(x) x^0g(x) 

with f{x) = log (1 + 3x) and gix) = 2x. As /'(*) = 3/(1 + 3x) and g'(x) 
= 2 it thus follows that 

lim log(l + 3x) = lim 3 = 3 

x~o 2x x ^o 2(1 + 3x) 2 

Example 6-7 Determine the derivative dy/dx for each of the following 
functions / = fix) where : 

(a)/(x) = log(3x2 + 2); 

(b) fix) = log tan 2x; 
(c)/(x) = 3*x2; 

(d) fix) = (sin xy. 



286 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

Solution (a) Here we must apply Eqn (6-26), with g(x) = 3x 2 + 2. As 
g'(x) = 6x it follows at once that 

d [log (3x2 + 2)] 6X 



dx L D v " 3x 2 + 2 

(b) Again we must use Eqn (6-26), but this time with g(x) — tan 2x. 
As g'(x) = 2 sec 2 2x, we have that 

d 2 sec 2 2x 

— (log tan 2x) = = 2 sec 2x . cosec 2x. 

dx tan ix 

(c) We have 

— (3*x 2 ) = 3* — (x 2 ) + x 2 4- (3*) 
dx dx dx 

which, by virtue of Eqn (6-25), becomes 

— (3^ 2 ) = 2x . 3* + x 2 3* log 3 
dx 

giving 

— (3*x 2 ) = (2x + x 2 log 3)3*. 
dx 

(d) We set y = (sin x) x and take logarithms to get 
logj; = xlog sin x. 

Now, differentiating, we find that 



1 d J , • d /, • N 

- • — = log sin x + x — (log sin x) 

y dx dx 



or 



dy 

— = (sin xWlog sin x + x cot x). 

dx 

Partial differentiation involving the logarithmic function is equally 
straightforward. The final example illustrates a typical situation. 

Example 6-8 If u = x log [1 + (x/y)] + y log [1 + (y/x)], show that 

8u 8u 

x 1- y — = m. 

dx f dy 

Solution We start by computing dujBx. It is readily seen that 



SEC 6-4 HYPERBOLIC FUNCTIONS / 287 



du 

8. 



^iog(i + ^ + ,-i.og(. + j;) + v£io g (i + i) 

= , °s( 1+ ;) + ^T1^7 + - , '-TT^(^ 



and so 



du , / , x\ x 



y] x + y x(x + y) 

The symmetry of x and y in u then allows us to interchange x and y in the 
above partial derivative in order to derive du\dy without further calculation. 
We obtain 

8u i /, , y\ , y x 2 



-«. log (l + >) + - 

y \ xj x 



8y \ x) x + y y(x + y) 

Hereafter, direct substitution verifies that 

du du 

X ¥x +y Jy = U - 

6-4 Hyperbolic functions 

It is useful to define new functions called the hyperbolic sine, written sinh x, 
and the hyperbolic cosine, written cosh x, which are related to the exponential 
function. This is achieved as follows. 

definition 6-3 (hyperbolic functions) For all real x we define sinh x and 
cosh x by the requirement that 

t x —r e~ x e* + e~ x 

sinh x = , cosh x = 

2 2 

It is an immediate consequence of the series for e x and e~ x that 

x 3 x b x l x 2m+1 

slnhx=x+ _ + _ + _ + ... +__+..., (6 . 27) 



and 



r 2« 



coshx=1+ _ + _ + _ + . ..+_+.... (6 . 28) 

Furthermore, it also follows from Definition 6-3 that sinh x is an odd function 
and cosh x is an even function. 

We now define the hyperbolic tangent, cotangent, cosecant, and secant, 
denoted by tanh x, coth x, cosech x, and sech x, as follows. 



288 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 



EFINITION 64 




sinh x 
tanh x = — - — ; 
cosh x 


cosh x 
coth x = -— — ; 
sinh x 


cosech x = ^— — : 


sech x = — - — 



sinhx' cosh x 

We illustrate how useful identities may be established directly from 
Definition 6-3. Let us prove that 

sinh a cosh b + cosh a sinh b = sinh (a + b). 

Substituting for sinh a and cosh b from Definition 6-3 we obtain 

e a g-a e 6 _|_ e -6 ga _j_ g-a g& g-6 g(a+6) — e -(a+6) 

2 2 + 2 2 = 2 ' 

which proves our result since [e (a+ « — e- (a +«]/2 = sinh (a + b). Similar 
manipulation establishes the validity of all the identities listed below in 
Table 61. 

Table 61 Identities for hyperbolic functions 



sinh (x ± y) = sinh x cosh y ± cosh x sinh y; (6-29) 

cosh (x ± y) = cosh x cosh j ± sinh x sinh ^ ; (6-30) 

cosh 2 x — sinh 2 x = 1 ; (631) 

tanh 2 x + sech 2 x = 1 ; (6- 32) 

1 + cosech 2 x = coth 2 x. (6-33) 



Table 6-2 Derivatives of hyperbolic functions 

— (sinh x) = cosh x; (6-34) 
ax 

— (cosh x) = sinh x; (635) 
dx 

— (tanh x) = sech 2 x; (636) 
dx 

— (coth x) = — cosech 2 x; (6-37) 
dx 

— (cosech x) = — cosech x coth x; (6-38) 
dx 

— (sech x) = — sech x tanh x. (6-39) 
dx 



SEC 6-4 HYPERBOLIC FUNCTIONS / 289 

Appeal to Definitions 6-3 and 6-4 together with the differentiability 
properties of the exponential function establishes Table 6-2, the table of 
derivatives. 

The behaviour of the hyperbolic functions is indicated graphically in 
Fig. 6-2 and for comparison the graphs of y = \z x and y = \t~ x have been 
added to Fig. 6-2 (a). 



Functions inverse to the hyperbolic sine and cosine are introduced 
through the following definitions. 



definition 6-5 The inverse hyperbolic sine, arcsinh x, and the inverse 
hyperbolic cosine, arccosh x, are defined by the relationships: 

(a) y = arcsinh x o x = sinh_y; 

(b) y = arccosh x o x = cosh y. 

Their derivatives are readily obtained by direct use of this definition and 
we illustrate the process by deriving d/dx arcsinh x. 

If y = arcsinh x, then x = sinh y and so, differentiating with respect to 
x, we obtain 

dy 

1 = cosh y -f-> 

ax 

and so 

dj 1 1 



dx cosh y \/(l + sinh 2 j) 

by virtue of identity (6-31) and the fact that cosh y is essentially positive. 
Hence, using the fact that x = sinh y, we find that 

d , 1 

— (arcsinh x) = — — for all x. 

dx V(l + x 2 ) 

In the case of y = arccosh x we must proceed with more care. 
If y = arccosh x, so that x = cosh y, then, as before, differentiating with 
respect to x gives 

dy 

1 = sinh v . — 

y dx 

or, 

dy 1 



dx sinh j? 



290 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 




Fig. 6-2 Hyperbolic functions: (a) y = sinh x and y = coshx; (b) y = tanhx; 
(c) y = coth x; 



SEC 6-4 



HYPERBOLIC FUNCTIONS / 291 




y = cosech x 

(d) 
Fig. 6-2 (continued) (d) y = cosech x; (e) y — sech x. 

Now from the graph in Fig. 6-2 (a) we see that sinh y is positive if its argument 
arccosh x > and negative if arccosh x < 0. Thus two different inverse 
functions must be defined. 

If arccosh x > 0, then 

Table 6-3 Derivatives of inverse hyperbolic functions 



d_ 
dx 

d_ 
dx 

d_ 
dx 



• all x: 



I arcsinh - | = — , for 

\ a J V(* 2 + a 2 ) 

(x\ 1 XX 

arccosh - I = — , for arccosh - > and - 

a J \(x 2 — a 2 ) a a 

i x\ — 1 X X 

I arccosh - ) = — , for arccosh - < and - 

y a J v '(* — a') a a 



> i; 



> i; 



dx\ 



I arctanh 



x\ _ a 
a J a 2 — x 2 



I arccoth - 



for x 2 < a 2 ; 



for x 2 > a 2 ; 



dx \ a I a* — x* 

d / , A ~ a , „ 

— - I arccosech - I = , for all x; 

dx \ a ! x\(x 2 + a 2 ) 

d / , x\ —a „ , x x 

— I arcsech - I = — — , for arcsech - > and < - < 1 ; 

dx \ a! xv (a *■ — x l ) a a- 

d / x\ a x x 

— I arcsech - = — — — , for arcsech - < and < - < 1 . 

dx \ a I x\/(a^ — x l ) a a 



(6-40) 
(6-41) 
(6-42) 
(6-43) 
(6-44) 
(6-45) 
(6-46) 
(6-47) 



292 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

dy 1 1 , 1 



cLy sinh y V(cosh 2 y - 1) VCy 2 - 
Conversely, if arccosh x < 0, then 
dp 1 -1 -1 



for a- > 1 . 



for a- > 1 . 



dx s'mhy \/(cosh 2 y — I) \/(a 2 — 1) 

Other inverse hyperbolic functions are defined similarly and it is left to 
the reader to verify the remaining entries in Table 6-3. (In many books 
the inverse function is denoted by a superscript — 1 , when sinh -1 x is written 
in place of arcsinh x, etc.) 

The following examples are representative of the limiting and differenti- 
ability problems encountered with hyperbolic functions. 

Example 6-9 

5 sinh 3a- + xe x 

(a) Evaluate lim — ; 

4e 3 * 

(b) Find /'(a) if /(a) = sinh (a 2 + 3x + 1) 1/2 ; 

(c) Find /'(a) given that /(a) < is given by /(a) = arccosh (sin 2 a); 

(d) Determine f x andf y given that /(a, y) = xy cosh (a 2 + y 2 ). 

Solution (a) From Definition 6-3 it is easily seen that for large x 

sinh 3a = ^e 3x . 

Hence, applying the usual arguments, it follows at once that 

5 sinh 3a + Ae* . (5e 3 */2) + Ae* 5 

hm = lim = — 

4e3* ^ 4e3* 8 

(b) /"(a) = [cosh (a 2 + 3a + 1) 1/2 ] • - • „ (2 * + 3) 

w - / w L v ' J 2 (a 2 + 3a+ 1) 1/2 

so that 

fix) = , ^ X + 3) cosh (a 2 + 3a + 1) 1/2 . 

J y ' 2(a 2 + 3a + 1) 1/2 v ' 

(c) Set y = arccosh (sin 2 a) so that 
sin 2 x = cosh y. 

Differentiation with respect to a then gives 

dv 

2 sin a . cos x = sinh y . — 

dA 



SEC 6-5 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 293 

or 

dy 2 sin x . cos x 
dx sinh y 

As we are told that y —fix) < it then follows that 

dy —2 sin x . cos x —2 sin x . cos x 
dx = V(cosh 2 j> - 1) = V(sin 4 x- 1) 

provided sin x ^ 1 . 

8/\ 
(d) j- = j cosh (x 2 + j 2 ) + xj> d/dx cosh (x 2 + j 2 ) 

= y cosh (x 2 + y 2 ) + 2x l y sinh (x 2 + y 2 ). 

Similarly, 

8f 

j- = x cosh (x 2 + j 2 ) + 2xj> 2 sinh (x 2 + j> 2 ). 

6 - 5 Exponential function with a complex argument 

If we formally replace x by ix in the series expansion of t x in Theorem 6-1 
we obtain 

x 2 x 3 x 4 x 5 x 6 x n 

t ix = 1 + ix i 1 \- i 1- • • • + /» h • • • 

2! 3! 4! 5! 6! n\ 

Clearly e fa is a complex number for any fixed real number x and, writing 
it in the form e to = C(x) + iS(x), it follows by equating real and imaginary 
parts that 

v-2 x i y6 r 2» 

and 

X 3 X 5 X 7 , ^ x 2 " +1 

*(,)-,-_ + ___ + ... + (_!). __ + .., 

Thus, in fact, if x is regarded as a variable, S(x) and C(x) are functions of 
x and e te is, in some sense yet to be properly defined, a function of a complex 
variable. 

Assuming that the series for C(x) may be differentiated term by term it is 
easily verified that 

„„ s x 3 x 5 x 7 x 2n + l 

C'(x) =-~xH H h- • •+ (-l) n+1 h • • • 

W 3! 5! 7! T ^ V ; (2«+l)! + 



294 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

»* 

Next, differentiating C'(x) again with respect to x yields 

jc 2 x 4 X s x 2 " 

showing that in fact 

C"(x) = -C(x). 

Now, setting x = in the series for C(x) and C'(x), we find that 

C(0) = 1 and C'(0) = 0. 

Hence the function C(x) is seen to be the solution of the special differential 
equation 

with j(0) = 1 and /(0) = 0. 

This same differential equation with the conditions on y was encountered 
in Example 5-13 (a), where it was derived as the equation satisfied by y = cos x 
and its derivatives. Thus the function C(x) is, in reality, the function cos x. 
An analogous argument establishes that S(x) = sin x. On account of this 
identification of C(x) and S(x) we may write 

t ix = cos x + i sin x. (6-48) 

As a direct consequence of replacing x by — x in Eqn (6-48) and using the 
fact that cos x is even, but sin x is odd, we find that 

e -te = cos x — i sin x. (6-49) 

Combination of Eqns (6-48) and (6-49) leads to the following definitions of 
the sine and cosine functions. 

DEFINITION 66 

sin x = and cos x = 

2/ 2 

Comparison of Eqns (415) and (6-48) shows that e ix represents a complex 
number of unit modulus lying on the unit circle drawn about the origin. 
The argument of e tx is x. 

Slightly more general than Eqn (6-48) is the complex number e (x+i ^ for, 
by the property of indices together with Eqn (6-48), we have 

e cr+<irt = e * . c*» = e*(cos y + i sin y), (6-50) 

showing that 

| e's+M | = e* and arge te +w> = y. (6-51) 



SEC 65 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 295 

». 

Thus the modulus-argument form of a general non-zero complex number z 
may be written 

z = re* 9 , 

where 

r = | z | and = arg z. (6-52) 

This is, of course, an alternative form of Eqn (4- 1 5). 

As it is true for any exponent a that (a x ) a = a xx , it follows that (e ix ) a = 
e iax , so that from Eqn (6-48) we arrive at the result 

(cos x + i sin x) a = cos a.x + i sin xx. (6-53) 

This is simply de Moivre's theorem (Theorem 4-2) for any exponent a and 
not just for the integral values used in the first proof of this important theorem. 
To close, let us apply these results to give an alternative derivation of the 
results of Example 4-10, and also to express sin™ and cos™ in terms of 
sums involving sin rd and cos rd, as promised in that example. As in Chapter 
4, the argument is best presented by example. 

Example 6-10 

(a) Express sin nd and cos nd in terms of cos and sin 0. Deduce the 
form taken by the result when n = 4. 

(b) Express cos 7 in terms of cos rd. 

(c) Express sin 5 in terms of sin rd. 

Solution 
(a) 
cos nd = Re(e toe ) = Re[(e iS ) B ] = Re[(cos d + i sin 0)»]. 

sin nd = Im(e te ") = Im[(e ie )»] = Im[(cos d + i sin 0)»]. 

When « = 4we have 

(cos d + i sin 0) 4 = cos 4 8 + Ai cos 3 d sin d — 6 cos 2 d sin 2 

— Ai cos sin 3 8 + sin 4 0. 
Hence 

cos Ad = Re[(cos -h / sin 0) 4 ] = cos 4 0-6 cos 2 0'sin 2 + sin 4 

and 

sin 40 = Im[(cos + i sin 0) 4 ] = 4(cos 3 sin - cos sin 3 0). 

(b) From Definition 6-6 we may write 

-i«\7 



COS 7 



_ /e« 9 + e-* e \ ' 



296 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 



y the 



Expanding the right-hand side by the Binomial theorem, simplifying and 
grouping terms, we obtain 

1 /e" 9 + e" w e 5 * 9 + e~ 5ie e 3 * 9 + e" 3 * 9 
^ = 2^— + 1 ~T— + 2l —2 

Again using Definition 6-6, we see that this immediately simplifies to 
cos 7 6 = — (cos 70 + 7 cos 50 + 21 cos 30 + 35 cos 0). 
(c) From Definition 6-6 we may write 



Expanding the right-hand side, simplifying and grouping terms gives 
1 /e^* 8 _ e-5i e e 3 ' 9 — e~ 3<9 e* 9 — e~ i9 \ 

sin5e = 2-<(-^ 5 — 2T- + l0 —2r-} 

Again appealing to Definition 6-6, we see that this immediately reduces to 

sin 5 6 = — (sin 50-5 sin 30 + 10 sin 6). 
16 

A variant of the method used here and in example (b) above is to be found 
outlined in Problems 6-37 and 6-38. 

PROBLEMS 

Section 6-1 

6-1 Solve the differential equation dy/dx = y, with y(0) = c, as in Section 61, by 
substituting 

CO 

y = J a rX r . 

Hence deduce that, provided c ^ 0, the differential equation has the non- 
trivial solution y = ct x . 

6-2 The function y = e~ x satisfies the differential equation dy/dx = —y, with 
y(0) = 1 . Use the method of the previous problem to verify the series solution. 

6-3 It follows from the argument preceding Eqn (6- 16) in Section 6T that 



< S n - Sr < 



x" 



(R - D! 

where the integer R > 2x. Use this result to deduce the least number of terms 
that must be included in the series expansion of e 2 in order that the error 
involved is less than 0-01. 



PROBLEMS / 297 



6-4 Evaluate the following limits: 

4e 2j + xe x + 3 



(a) lim 



(b) lim 



^ x 5xe 3x + c*+ 1 

(x 2 + l)e 3 * + e z + 1, 



(2x 2 - 3x + l)e 3 * 
(2 - x 2 )f + 3 _ 



V^T+v + xy**' 

,„ ,. 3(2 e~ 3 * + x 2 + 1 ) 
(d) ^o 4e* + 2*+l • 



6-5 Make use of the series expansion of c x to evaluate the following limits and 
verify your result by using Theorem 5- 14: 

(a) lim — ; 

1 — e~ x 

(b) lim -^-r-; 
3-_>o sin Ax 

, , .. & -\-x 



6-6 Differentiate the following functions: 
(a)/(x) = 2e*cosx; 
(b) /"(jc) = e 3 -* arcsin x; 
(c)'/(x) = e*/x 2 ; 
(d) /(x) = e* 8lM . 



6-7 Differentiate the following functions: 
(a) f{x) — arcsin e 2 *; 
(b)/(*) = v '(*e* + x); 
(c)/(x) = sinOe*+ 2); 
(d) /(*) = (e* - l)/(e* + 1). 



Section 6-2 

6-8 Differentiate the following functions: 
(a) /**) = 3 exp [-(** + *+ 1)]; 

(b) f(x) = e si " 2j: ; 

(c) /(jc) = cos [exp (x sin x + 2)]. 



6-9 Find the second derivatives of the following functions: 

(a)/(x) = e 3 * 2 ; 
(b)/(x) = sin(l + e 2 *); 
(c) f(x) = e sinr . 



298 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

6- 10 Consider the function f(x) defined as follows: 

(e- 1 ^ 2 for x # 0, 



/to = , 

J 10 for x = 0. 

Clearly the differentiability properties of this function at the origin must be 
deduced directly from the definition of a derivative. To deduce these properties 
show first that for x i= 0, it follows that 

/'to = |e-^ 2 . 

Then, by using Definition 5-2 together with Theorem 6-3, prove that/'(0) = 0, 
and hence deduce that 

\imf'(x) = f'(0) = 0. 

x-*0' 

Finally, deduce that in general, 

/<">(*) = e~ 1/xi x (Polynomial in Ijx), 

and hence by using an inductive argument prove that / <n, (0) = for all n. 
This is an example of a function which is capable of differentiation an arbitrary 
number of times for all x, and yet which has every derivative equal to zero at 
one point of its domain of definition. 

611 Find SfjBx and dfjdy, given that 

f(x, y) = e sin (vlx \ 
6 12 Show that u == xy + xt ylx satisfies the equation 

x-^+y— = xy + u. 

ox y Sy y 

6- 13 Find d//d/, given that 

f(x, y) = e 2 * +7 *' 
where x = cos t, y = sin t. 



6 14 Find df/8u and Sfjdv if 

x 

f(x, y) = 2 arctan - 

y 

with x = u sin v and y = u cos v. 



Section 6-3 

6-15 Evaluate the following limits: 

, , ,. (x- l)logx 2 

(a) lifn v \ s ; 

x— *cc X 

(b) lim **££»■. 

X-+K, Ax + 1 



PROBLEMS / 299 



(c) lim 

z—0 



log (3 sin x) — log [(1 + x) sin x] 



It* - 1 
(d) lim [log (3* + 1) - log (2x + 5)]; 



(e) lim 

J-*CO 



log (1 + 2e*) 



616 Let/(x) and g{x) be functions such tha,t lim/(x) = and lim^(x) = but 

x— >a x-*a 

lim 44 = *■■ Then 
lim lo § H + y (x)] = ]im log [1 + /(x)] 1 ''/**) = lim log {[1 + /(x)]i//(*>}/MArt*>. 

However, it follows from Chapter 3, Section 3 that lim [1 + /(jc)] 1 '-^ e* e, 
so that 

, im ] °g [1 +/ (X)] = «m log e™> = Hnn ^ = *• 



Apply this result to evaluate the following limits : 

log(l +lx) _ 

2x 
log (1 + 3 sinx) 



(a) , im !5S0±H). 

x^o 2x 



(b) lim 

x-*0 

(c) lim 

*— 



log[l - 2sin 2 (x/2)], 



use your result to deduce lim (cos x) 1,x . 

a-—0 

617 Apply Theorem 514 to evaluate the limits in Problem 6-16. 



618 Differentiate the following functions: 
(a)/(x) = logO 3 + 7x 2 + 2); 

(b) f(x) = log sin 2x; 

I x — V 

(c) f{x) = log cos 



619 If v = [f(x)]o^ then, taking the natural logarithm, 
log y = £-(x)log/(x). 
Hence, differentiating with respect to x, it follows that 



dy 
dx 



^x)log/(x)+<g/'(x) 



[/(x))^> 



Use this result to differentiate the following functions ; 

(a) y = x x ; 

(b) y = (sin2x)*; 

(c) v = x* inx ; 

(d) 'y = 10 lo 8 sin *. 



300 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 

6-20 If u = x log (1 + xjy) + y log (1 + yjx) 

a 82 " o^ 2 " 
show that x 1 -— = y 2 — . 

8x 2 J dy 2 

6-21 Find the total derivative dz given that 

z = log (x 2 + 2y 2 ). 
6-22 Show that the function 

/(*> y) = arctan y/x + log (x 2 + y 2 ) 
satisfies the equation 

dx 2 dy 2 

6-23 By taking logarithms deduce du/dx, dujdy, and dujdz if u = (xy) 2 . 

Section 6-4 

6-24 Use the definitions to establish the form taken by : 

(a) sinh x; 

(b) cosh x; 

(c) tanh x; 

when x is large. Distinguish between x large and positive and x large and 
negative. 

6-25 Prove by means of the definition that 

(cosh x + sinh x) n = cosh nx + sinh nx. 

6a26 Use the definitions to verify any three of the identities contained in Table 6-1. 

6-27 Prove by means of the definitions that : 

(a) 2 sinh x cosh y = sinh {x + y) + sinh (x — y); 

(b) 2 cosh x cosh y = cosh (x + y) + cosh (x — y); 

(c) 2 sinh x sinh y = cosh (x + y) — cosh (X — y). 

6-28 Verify any three of the entries in Table 6-2. 

«i/29 Verify the derivatives of arccosech x/a and arcsech x\a given in Table 6-3. 
6-30 Evaluate the following limits, using the series (6-27) and (6-28) where necessary: 
x 3 cosh 2x + e x 



(a) lim 

X— »-oo 

(b) lim 

X— *■ — 

(c) lim 

x->0 

(d) lim 



(2x 3 + x + l)e 2 * + * 3 e- 2 *' 

x 3 cosh 2x + e* 
,o (2x 3 + ^ + l)e 2 * + x 3 e- 2 *' 
sinh ax 



X-+0 x 

1 — cosh 2x 

x^o 3x 2 



6-31 Differentiate the following functions: 

(a) f(x) = sinh 2x cosh 2 x; 

(b) f(x) = exp (1 + cosh 3x); 



PROBLEMS / 301 



(c)/X*)=«log(tanhx); 

(d) f{x) = arcsech (x 2 + J) if /(*) > 0; 

(e) f{x) = cosh (sin 2x). 

6-32 Evaluate dujdx and Suj8y given that : 

(a) uix, y) = sin x cosh xy\ 

(b) «(x, j) = sinh (x 2 + x sin y + 3y 2 ) ; 

(c) u(x,y) = xcosh^+2^ 

Section 6-5 

6-33 Establish by means of the definitions that: 

(a) sin (Jz) = i sinh z; 

(b) cos (;'z) = coshz; 

(c) sinh(z'z) = /sin z; 

(d) cosh (iz) = cos z. 

6-34 Given that a, b are positive real numbers, deduce four trigonometric identities 
by equating real and imaginary parts in each of the following results 

Qia gib = Qi(a+b) and e* a . e - *^ = £*(«-&), 

6-35 Express the following complex numbers in the form re i6 : 
(a) 1 + i; (b) 1 - /; (c) -80V3 - 1); 
(d) (-1+/) 8 ; (e) (5+ 140/(4 + /). 

6-36 Show by means of de Moivre's theorem that : 

(a) 32 cos fr 6 = 10 + 15 cos 26 + 6 cos 46 + cos 66; 

(b) sin 70 = 7 sin 6 - 56 sin 3 6 + 112 sin 5 6-64 sin 7 6. 



cos 6 = - j z H — | and sin 6 

2\ z) 



6-37 Verify that if z = e ie , then 

~*H) 

and, more generally, 

cos r6 = - J z r + — J and sin rd = — - | z r J • 

By replacing cos 6 and sin 6 by their equivalent expressions involving z, 
make use of these results to express cos 2 6 sin 3 6 in terms of sin n(t. 

6-38 Use the method of Problem 637 to express sin 8 in terms of cos nd. 

6-39 Consider the function cosh z, where z = x + iy. Then, using Definition 6-3, 
deduce that coshz =a when z = (2« + l)«'/2, with n = 0, ±1, ±2, . . .. 
Use the results of Problem 6-33 to deduce the zeros of cos z. 

6-40 Consider the function sin z, where z = x + iy. Then, using Definition 6-6, 

deduce that sin z = when z = w, with n = 0, ±1, ±2 Use the 

results of Problem 6»33 to deduce the zeros of sinh z. 



Fundamentals of 
integration 



7-1 Definite integrals and areas 

The work of this chapter is concerned with the theory of the operation known 
as integration, which occupies a central position in the calculus. The connec- 
tion between differentiation and integration is basic to the whole of the 
calculus and is contained in a result we shall prove later known as the funda- 
mental theorem of calculus. Once again, limiting operations will play an 
essential part in the development of our argument. In fact we will show not 
only how they enable a satisfactory general theory of integration to be 
established, but also how they provide a tool, albeit a clumsy one, for the 
actual integration of functions. However, aside from a number of simple but 
important examples, the practical details of the evaluation of integrals of 
specific classes of function will be deferred until Chapter 8. 

We begin by seeking to determine the shaded area / of Fig. 71 which is 
interior to the region bounded above and below by the curve y = f(x) and 
the x-axis, respectively, and to the left and right by the lines x = a, x = b. 

This approach will lead naturally to what is called the definite integral of 
f{x) over the interval a < x < Z>, and it illustrates a valuable geometrical 
interpretation of the process of integration. Although we use the definite 
integral to give precise meaning to the notion of the area contained within a 
closed curve, this appeal to geometry is not actually necessary when defining 
an integral. Indeed, we shall also show how a purely analytical definition of 




Fig. 71 Area / defined by y = f(x). 



SEC 7-1 DEFINITE INTEGRALS AND AREAS / 303 

a definite integral, quite independent of any geometrical arguments, may be 
formulated. 

Let f(x) be a non-negative continuous function defined in the closed 
interval [a, b] and consider, for a moment, the conceptual problem that arises 
when trying to determine the area /defined by it in Fig. 71. The only simple 
plane geometrical figure for which the concept of area is defined in an ele- 
mentary and unambiguous manner is the rectangle, so that we shall seek to 
define the area / in terms of the limit of a sum of rectangular areas. It should 
perhaps be remarked at this point that the derivation of the formula nr 2 for 
the area of a circle of radius r involves the concept of integration, although 
this is invariably avoided in any first encounter by the employment of 
arguments that are at best only plausible. 

We shall start our discussion from the postulates that (a) the area of a 
rectangle is given by the product length X breadth, (b) the area of the union 
of two non-overlapping rectangles is the sum of their separate areas, and 
(c) if a rectangle is divided into two parts by a curve, then the sum of the 
separate non-rectangular areas comprising these two parts is equal to the 
area of the rectangle. 

On the basis of postulate (c), we at once see that the area / in Fig. 7-1 
exceeds the rectangular area ABEF, but is less than the rectangular area 
ACDF. Letting m, M denote, respectively, the minimum and maximum 
values attained byf(x) in [a, b], this result becomes 

m(b - a) < / < M(b - a). (7-1) 

This inequality, although interesting, must obviously be refined if it is 
ever to lead to the actual value of/. In principle, our approach will be simple, 
for we shall begin by dividing [a, b] into n adjacent sub-intervals in each of 
which an inequality of type (7T) will apply, after which we shall use postulate 
(b) to find better upper and lower bounds for /. 

Specifically, we start by choosing any sequence of n + 1 numbers xo, 
xi, . . ., x n subject only to the requirements that .yo = a, x n = b, and 

Xo < XI < • • • < X n -1 < X n . 

The sequence {x r }" r =o so defined is called a partition P of the interval [a, b], 
and for any given value of n it is obviously not unique. Next, on each sub- 
interval [xi-u xt], let the function f(x) attain a minimum value mi and a 
maximum value Mi and denote the length of the /th sub-interval by A{, so 
that 

Aj = x t — Xi-i. 

We now define numbers Sp and Sp called, respectively, the lower and upper 
sums taken over the partition P, by the expressions 

n 

Sp = miAi + w 2 A 2 + • • • + m n A» = 2 w rA r (7-2) 



304 / FUNDAMENTALS OF INTEGRATION 



CH 7 



and 



S P = MiAx + M 2 A 2 + • • • + M„A B = 2 M^ r . 



(7-3) 



Clearly, as Figs. 7-2 (a), (b) illustrate, Sp and Sp are, respectively, under- and 
over-estimates of the area /. 

The fact that Sp < S P is apparent on geometrical grounds, but it also 
follows without appeal to geometry by considering the difference 

Sp — Sp — (Mi — wi)Ai + (M 2 — w 2 )A 2 + 



+ (M„ — m„)A. n . 



(7-4) 




Fig. 7-2 (a) Shaded area represents lower sum S p ; (b) shaded area represents 
upper sum S p . 

In this equation we have, by definition, A r > and M r > m r for r = 1, 
2, . . ., n, so that 

Sp- Sp>0 or, 5p < Sp, 

and thus by postulate (c), 

Sp<I<Sp. (7-5) 

It would seem reasonable to suppose that as the number n of points in a 
partition increases, provided the lengths of all intervals shrink to zero, the 
limit of both the lower and upper sums must be /, the desired area. We prove 
this in two stages, first considering the effect on the lower and upper sums of 
the refinement of the partition P by the inclusion of extra points. 

It will suffice here to consider only the effect of the inclusion of one extra 
point Xr between x r -i and x r in the partition P. The resulting partition P' is 
called a refinement ofP, in the sense that although P' has more points than P, 
all points of P are also points of P'. 

Suppose that in the intervals [x r -i, x r '] and [x/, ay] the function f(x) 
attains the minimum values m r ' and m r ", respectively. Then the effect of the 



SEC 7-1 



DEFINITE INTEGRALS AND AREAS / 305 



extra point is to replace the term m r {x r — x r -i) in the lower sum Sp by the 
sum m r '(x r ' — x r -i) + m r "(x r — x r ') thereby generating the sum S P ' appro- 
priate to the refinement P' of the partition P. As it must be true that m r < m r ' 
and m r < m r ", it thus follows that 

m r \Xr — Xr-l) + m r "{x r - X r ') > m r (x r - Xr-l), 

whence 

Sp < Sp'. (7-6) 

Identical reasoning involving the maxima M/ and M r " attained by f(x) in 
the intervals [x r -i, x r '] and [x r ', x r ] establishes that 

(7-7) 



Mr' = Mr 



mr = Mr" 



m T = m r 





(b) 



Fig. 7-3 Effect of refinement of a partition: (a) area inequality on interval 
[xr-i, x r ] ofP; (b) area inequality on interval [x r -u x r ] o(P'. 

The inequalities leading to results (7-6) and (7-7) are illustrated geometric- 
ally in Figs. 7-3. Thus in Fig. 7-3 (a) the area inequalities associated with the 
interval [x r -i, x r ] of P are displayed, whilst in Fig. 7-3 (b) the corresponding 
situation is displayed for the refinement P' produced by inserting an addi- 
tional point x r ' in [x r ~i, x r ]. 

The further refinement of the partition P' by the inclusion of additional 
points only serves to reinforce results (7-6) and (7-7). We have thus estab- 
lished that if the partitions Pi, Pz, . . ., P m are successive refinements of the 
partition P, then 



m(b — < a) < S Pl < Sp 2 < 



< S Pm < / < S Pm < S Pm . < 



• • • < S Pl < M(b - a). (7-8) 

Expressed in words, the effect of refinement of a partition is to increase the 
corresponding lower sum and to decrease the corresponding upper sum, so 
that {S Pr } is a monotonic increasing sequence of numbers, and {S Pr } is a 
monotonic decreasing sequence of numbers. 

For the second and final stage of our argument we introduce the norm 
|| A | \p of a partition P by means of the definition 



306 / FUNDAMENTALS OF INTEGRATION CH 7 

] | A | \ P = max (x t - x<_i). (7-9) 

i 

That is to say, for any partition P of the interval [a, b], the norm ] ] A ] \ P is 
the length of the longest sub-interval of [a, b] produced by the partition. 

Let us consider a sequence of partitions which are successive refinements 
of P and are such that 

lim II A lip =0. 

m—*oa 

Then by the postulate of Section 3-2, as {S Pr } is monotonic increasing and 
bounded above it must tend to a limit S and, similarly, as {S P } is mono- 
tonic decreasing and bounded below it must tend to a limit S, where 

S<I<5. (7-10) 

To show that 5 = S, as would be expected, observe that if 

dp = max (Mi — m t ) for all /', 

i 

then Eqn (7-4) gives rise to the inequality 

5p - S P < <5 P (Ai + A 2 + • • • + A») = d P {b - a). (7-1 1) 

Hence, for any sequence of partitions Pi, P%, . . ., P m , . . . which are refine- 
ments of P with the property that lim 1 1 A | \ Pm -»- 0, it follows from the 
continuity of f(x) that lim d Pm -»• 0, thereby showing that {S Pm — S P } is a 
null sequence. Thus {S Pm } and {S Pm } both have the same limit. 

Taken in conjunction with Eqn (7-10), we have proved that because of the 
continuity of/(x), the limit of the lower sums is equal to the limit of the upper 
sums, and each is equal to the limit / which has been interpreted as the 
shaded area in Fig. 7T. 

The limiting argument used above certainly suffices to define the area /, 
but before formulating our definition of the definite integral, let us first make 
a useful generalization of our argument. With the partition P used earlier 
associate any set of n numbers fi, h, ...,!» for which it is true that 

*o < f 1 < Xi, xi <_f 2 < xt, . . ., x n -i < i n < x n . 
Now form the approximating sum S P defined by 

Sp =/(^i)A 1 -r-/(! 2 )A 2 + • • • +/(|„)A n . (7-12) 

Then because mi </(&) < Mi, it follows at once that 

S Pm <S Pm <S Pm , (7-13) 

for all refinements Pi, i>2, . . ., Pm, . . . of the partition P. Consequently, 
since lim S Pm = lim S Pm = /, it follows immediately from Theorem 3-6 that 
lim S P = /. 

This important result asserts that if f(x) is continuous on [a, b], then as 



SEC 7-1 DEFINITE INTEGRALS AND AREAS / 307 

the partition is refined, so the corresponding upper and lower sums S Pm , 
Sp and the approximating sum S Pm all converge to the same limit. We now 
state this as our first fundamental theorem which forms the basis of our 
development of the integral. 

theorem 7-1 (first limit theorem for sums defined on a partition) Let 
f(x) be a continuous non-negative function on the closed interval a < x < b, 
and let Pi, P2, . • -,Pm,- ■ . be a sequence of successive refinements of some 
partition P of [a, b] with the property that lim 1 1 A 1 1 Pm = 0. Then, if & is any 
point in the /th sub-interval of length A* generated by the partition P m , and 
S P and S P are respectively the lower and upper sums associated with P m , 
it follows that 

n 

lim S P = lim S Pm = lim T /(£ i)A«. 

— m m 1 1 A I 1 n ■ t 

m-»oo m-»oo ||A||p m -*0 t = l 

This theorem suggests the following form of definition for the definite 
integral. 

definition 7-1 (definite integral of a continuous non-negative function) 
Let f(x) be a continuous non-negative function on the closed interval 
a < x < b, and let Pi, P2, , . ., Pm, ■ ■ ■ be a sequence of successive refine- 
ments of some partition P of [a, b] with the property that lim 1 1 A | \P m = 0. 
Then, if |« is any point in the rth sub-interval of length A< generated by the 
partition P m , the definite integral of f(x) integrated over the interval [a, b], 
and written symbolically 

rb f(x)dx, 



i 

Ja 



is defined to be 

Cb 



Ja 



f(x)dx = lim 2 /(&)A«. 

l|A||P m -0 i = l 



In the context of a definite integral, the function f(x) is called the inte- 
grand, the numbers a, b are called the lower and upper limits of integration, 
respectively, and the sign J" itself is called the integral sign. 

In summary then, a definite integral of a positive continuous function 
f(x) integrated over the interval [a, b] is a positive number defined by means 
of a limiting process. It may be interpreted geometrically as the shaded area 
/ below the curve y = f{x) as shown in Fig. 7-1. 

To show that this is a Working definition, in the sense that it can be used 
to yield a useful answer, let us now apply it to a simple function. 

Example 71 Evaluate the definite integral 
x 2 dx, where a < b. 

J a 



308 / FUNDAMENTALS OF INTEGRATION CH 7 

Solution As x 2 is everywhere continuous and is non-negative on the stated 
interval Definition 7-1 applies. Thus we start by considering a convenient 
partition P n in which [a, b] is divided into n equal sub-intervals, each of 
length A = (b — d)\n. Then, if for convenience we identify f t with the right- 
hand end-point of the rth sub-interval, we have 

f i = a + A, | 2 = a + 2A, h = a + 3A, . . ., &, = a + «A. 

Hence, from Definition 7-1, 

/ = lim J (a + (A) 2 A. 

n— »-oo i = l 

Expanding and grouping the terms of the summation then gives 
/ = lim [wa 2 A + 2aA 2 (l + 2 + 3 + • • • + ri) 

+ A3(l 2 + 22 + 32 + ■ • • + «2)]. 

Using the fact that A = (b — d)\n together with the well-known results 
1+2 + 3 + - • • + it = 2 („ + i) 

and 

12 + 22 + 3 2 + ... + „2 = ^+i)^L±l), 

6 
it follows that 

/ = lim [a\b -a) + a{b - a) 2 K w + *) | 

Thus, taking the limit, we find 
/ = K* 3 - « 3 ), 
and so 



_l ri. vi v- + 1X2" + 1) 



f 



x 2 dx = K& 3 - a 3 )- 



In terms of numbers, if a = I, b = 2, then 



I 



2 * 2 dx = K2 3 - l 3 ) = -■ 
1 j 



When the behaviour of f(x) is monotonic over the interval a<x<b, 
then Theorem 7-1 coupled with Definition 7-1 can often be used to derive 
interesting and useful series approximations to the definite integral as the 
following example illustrates. 



SEC 7.1 DEFINITE INTEGRALS AND AREAS / 309 

Example 7-2 Show that 



» / 1 \ r 2 dx n ( l 



x r =i \n + r 

Solution In this case f(x) = 1/x, which is continuous, positive, and mono- 
tonic decreasing on the interval [1, 2] so that Theorem 7-1 and Definition 7-1 
apply. We again choose a partition P n which divides the interval [1, 2] into 
n equal sub-intervals of length A = 1/n. The general point x r in the partition 
P n is, of course, x r = 1 + rjn so that 



n + r 



Thus as/(x) is monotonic decreasing, it follows that on the interval [x r -i, x r ], 
f(x) attains its maximum value M r at x r -i and its minimum value m r at x r , 
where 

n , » 

M r = r and m r = 



« + r — 1 « + r 

Hence 

5, -if-^V and S pBa if— 2_U 

~ Pn r =i \n + r) n Pn r -i \n + r - 1/ n 

so that from Theorem 7-1 and Definition 7-1, we deduce that 



if 1 )>r*>i(— 

r ± x \n + r - 1/ ~ Ji x r =i \n + r 



A few numbers might help here, so we show in the table below the be- 
haviour of the upper and lower sums S Pn and 5" Pn as a function of n. 



n 


Sp n 


Sp„ 


5 


0-7456 


0-6456 


10 


0-7188 


0-6688 


15 


0-7101 


0-6768 


00 


0-6931 


0-6931 



We shall discover later that the exact result, which is shown in this table 
against the entry n = oo, is in fact log e 2. 

Before closing this section let us give brief consideration to the effect on 
Theorem 7T of removing the condition of continuity imposed on the function 
f{x) and substituting instead the condition that/(x) is bounded. The argu- 



310 / FUNDAMENTALS OF INTEGRATION CH 7 

ment leading to Theorem 7-1 proceeds as before until the stage at which S P 
and S Pm are defined. Then, without the continuity of f(x) to ensure thai 
| M r — m r | -> as \ Xr — x r -i | -> 0, it is no longer possible to infer that 
when lim S Pm and lim S Pm exist, they are necessarily equal. However, if they 
do exist and are equal, it follows as before that lim S Pm also converges to the 
same limit. Thus we arrive at the following more general form of Theorem 
7-1. 

theorem 7-2 (second limit theorem for sums defined on a partition) Let 
f(x) be a non-negative bounded function defined on the closed interval [a, b], 
and let P\, P 2 , . . ., P m , . . . be a sequence of successive refinements of some 
partition P of a < x < b with the property that lim 1 1 A | | Pm = 0. Then, if 
f < is any point in the z'th sub-interval of length A< generated by the partition 
P m , and S Pm and S Pm are respectively the lower and upper sums associated 
with P m , it follows that if 

lim S P = lim S P = /, 

it must also be true that 
/ = lim 2 /(f«)A,. 

II A||P m -*0 » = 1 

The corresponding modification of Definition 7-1 is given below for 
reference and, because this form of definition was first given by B. Riemann 
(1826-66), the definite integral is known formally as the Riemann integral. 
Usually only the term definite integral will be employed. 

definition 7-2 (Riemann integral of a non-negative function) Let/(x) 
be a non-negative bounded function on the closed interval a < x < b, and 
let Pi, Pi, ■ . ., P m , ... be a sequence of successive refinements of some 
partition P of [a, b] with the property that lim || A \\ Pm = 0. Furthermore, 
let fj be any point in the rth sub-interval of length A* generated by the 
partition P m , and let S Pm and S Pm be, respectively, the lower and upper sums 
associated with P m . 
Then, if 

lim S P = lim S P , 

the Riemann integral off(x) integrated over the interval [a, b], and written 
symbolically 



f 



b /Mdx, 



is defined to be 

rb f(x)dx= lim i/(IOA*. 

l|A||p m -0 i=l 



Ja 



SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 311 

To show that not all bounded functions are Riemann integrable it is only 
necessary to consider the integral over the interval < x < 1 of the function 



fM = (J 



1 for x rational 
for x irrational. 



Then clearly /(x) is non-negative and bounded on [0, 1], but by a suitable 
choice of the numbers & in the approximating sum of Definition 7-2, the 
limit of the sum may be made to assume any value between zero and unity. 
This situation arises because the limits of the upper and lower sums are not 
the same. In more advanced accounts these difficulties are overcome by 
defining a more general form of integral known as the Lebesgue integral. 

7-2 Integration of arbitrary continuous functions 

As most functions assume both positive and negative values in their domain 
of definition, our notion of a definite integral as formulated so far is rather 
restrictive, for it requires that the integrand be non-negative. A brief examina- 
tion of the introductory arguments used in the previous section shows that 
this restriction stems from our idea of area as being an essentially positive 
quantity, although this was not stated explicitly at any stage in our argument. 

Nothing in the limiting arguments that we used requires either the upper 
and lower sums themselves, or any of the terms comprising them to be non- 
negative. Since a term in either of these sums will be negative when m r or 
M r is negative, that is, when f(x) is negative, it follows that the inter- 
pretation of a definite integral as an area may be extended to continuous 
functions /(x) which assume negative values provided that areas below the 
x-axis are regarded as negative. This is illustrated in Fig. 7-4 in which the 
positive and negative area contributions to the definite integral of f(x) 
integrated over the interval [a, b] are marked accordingly. 

Thus using this convention when interpreting a definite integral as an 
area, we may remove the condition that the integrand /(x) be non-negative 
throughout all of Section 7- 1 . Because it simply amounts to the deletion of the 
word 'non-negative', we shall not trouble to reformulate our earlier definitions 
and theorems to take account of this result. It is interesting to observe that 
had we introduced the definite integral via the upper and lower sums, without 
any appeal to graphs and areas, this artificial restriction would never have 
arisen. 

The definition of a definite integral of a function /(x) integrated over the 
interval [a, b] immediately implies a number of important general results 
which we now state in the form of a theorem. No proofs will be offered since 
the results are virtually self-evident. 

theorem 7-3 (properties of definite integrals) Let/(x), g(x) be continuous 
functions defined on the closed interval a < x < b, and let c be a constant 
and k be such that a < k <b. Then 



312 / FUNDAMENTALS OF INTEGRATION 



CH 7 



(b) c/(x)dx = c f(x)dx (Homogeneity), 

Ja Ja 

J'b /*6 /*6 

(/(*) + g(x))dx = /(x)dx + g(x)dx (Linearity). 
a Ja Ja 




Fig. 7-4 Positive and negative areas defined by y = f(x). 



By virtue of these results, the definite integral of the function /(x) appro- 
priate to Fig. 7-4 could, if desired, be written in terms of the sum of three 
integrals involving non-negative integrands. To achieve this, notice that/(x) 
is negative for k\ < x < k%, so that for all x in this interval, —f(x) is positive. 
Then, first expressing our integral as the sum of three separate integrals over 
adjacent intervals 



f(x)dx = f(x)dx + f(x)dx + f(x)dx, 

Ja Ja Jki Jk2 



(7-14) 



we can replace — f(x) by | f(x) \ in the second of these integrals to obtain 



fb /-A-i 

f( X )dx = /{xyh 

Jp Ja 



° K2 \f(x) | dx + \"f(x)dx. 

ki Jk2 



(7-15) 



Each of these integrands is now the definite integral of a non-negative 
function as required. 

We must now take account of the fact that so far it has been implicit in 
our definition of a definite integral that x increases positively from a to b, 
where b > a. This sense, or direction, of integration is indicated in the definite 
integral by writing a at the bottom of the integral sign J to signify the lower 
limit of integration and by writing b at the top to signify the upper limit of 
integration. If, despite the fact that b > a, their positions as upper and lower 
limits of integration are reversed, this implies that integration is to be carried 
out in the direction in which x increases negatively. Because we are now 
allowing areas to have both magnitude and sign, to be consistent we must 
compensate for a reversal of the limits of integration by changing the sign of 



SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 313 

the integral. Hence we arrive at our next definition. 

definition 7-3 (reversal of limits of integration) If a < b, then we 
define the definite integral 

[ /(x)dX 
of a continuous function /(x) by the equation 

P/(x)dx = - f /(x)dx. 

Jb Ja 

Example 7-3 Evaluate the definite integral 

•i 



1 



2x 2 Ax. 

3 



Solution From Definition 7-3 we have 



2x 2 dx = — 2x 2 dx. 



Hence an application of Theorem 7-3 (b) together with the result of Example 
7-1 shows that 



f 2x 2 dx = -2 ( x 2 dx = -2(J)(33 - !») = - 



52 

y 



Since a definite integral is simply a number, the choice of symbol used to 
denote the argument of the function/forming the integrand is arbitrary, and 
often it is convenient to replace x by some other variable, say t. Thus 

\ b f(x)dx and ff(t)dt 

Ja Ja 

are identical in meaning, so that 

Cf(x)dx = Cf(t)dt. (7-16) 

Ja Ja 

On account of this fact, the variable in the integrand of a definite integral 
is often called a dummy variable, and it is sometimes said to be 'integrated 
out' when the integral is evaluated. This fact is usually recognized in modern 
accounts of the theory of the definite integral by simply writing 

f 



I 

Ja 



in place of either of the expressions in Eqn (7T6). The full significance of the 
symbol dx, which is suggestive of a differential, comes when changes of 



314 / FUNDAMENTALS OF INTEGRATION 



CH 7 




Fig. 7-5 (a) Area / bounded by curves y = fix) and y = g(x); (b) area below 
y =f(x); (c) positive and negative areas defined by y = g(x). 

variable of the form x = g(u) are made in Eqn (7-16) and it is for this reason 
that we choose to retain it. This matter will be taken up in detail in the next 
chapter, where it is shown that because of the chain rule for differentiation, 
dx can indeed be interpreted as a differential. 

Now that the definite integral has been extended to arbitrary continuous 
integrands we are in a position to determine quite general areas. Consider, 
for example, the situation illustrated in Fig. 7-5 (a) in which it is desired to 
determine the area / of the shaded region. Then obviously, referring to 
Figs. 7-5 (b), (c) we have 

/ = h + J 2 _ l 3 + / 4 , 

where h to h represent the positive areas identified by these symbols. 
However, we know that 



h 



= f(x)dx, 

Ja 



and from the form of argument leading to Eqn (7- 1 5) we also know that 
—h = \ g{x)dx, h = g(x)dx, -h = g(x)dx, 

where ki and fa are the first and second points of intersection of y = g(x) 
with the x-axis as x increases from a to b. 
However, by Theorem 7-3 (a) we have 

—h + h — h = g{x)dx, 

Ja 

so that combining these results we obtain 



SEC 7-2 



INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 315 



y =/(*) 



■HI 







o 



Fig. 7-6 Piecewise continuous function y = f(x) defining a sequence of areas 

A,/2, . . .,/„-!. 



/ = Cf(x)dx - ( b g(x)dx. 

Ja Ja 

From Theorem 7-3 (b) it then finally follows that 
I = f\f(x) - g(x))dx. 

Ja 



(7-17) 



Example 7-4 Find the area / between the two curves y = t 2x and y = —x 2 , 
which is bounded to the left by the line .v = 1 and to the right by the line 
x=3. 

Solution We start by making the obvious identifications f(x) = e 2x , 
g(x) = —x 2 , a = 1 and b = 3. Then from Eqn (717) it follows that 



-r 



(e 2 * + x 2 )d.v 



whence, using the results of Example 7-1 and Problem 7-3, we find 

26 
/ = Ke 6 - e*) + y • 

The fact that a definite integral is additive with respect to its interval of 
integration enables a function to be integrated even when it has discontinu- 
ities, provided only that they are finite in number and that elsewhere the 
function is continuous and bounded. This result is perhaps best seen dia- 
grammatically, though an analytical justification can easily be given without 
appeal to geometry. By way of example, consider the function y =f(x) 
illustrated in Fig. 7-6 which is bounded and continuous everywhere except at 
the discrete number of points r\\, r]2, . . ., r\ n - Such a function is said to be 
piecewise continuous, for obvious reasons. 

Using the valid interpretation of a definite integral in terms of area we see 



316 / FUNDAMENTALS OF INTEGRATION CH 7 

that the total shaded area / is the sum of the sequence of areas h, h, . . ., 
I n +u so that we may still write 

I=\ f(x)dx, (7-18) 

Ja 

but this time with the understanding that 

J'b r n - /*ij2- rb 

f(x)dx = f{x)dx + f(x)dx + • • • + /(.v)d.v. (7-19) 
o •>" J ni+ Jn,i + 

Here, as before, we have used r^— to signify the limiting process of 
approaching the point x = r\t from the left, and r] t + to signify the limiting 
process of approaching the point x = r\i from the right. 



Example 7-5 Evaluate the definite integral 
/=£/(x)dx 

when 

_ lx 2 for < .v < 1 
/ W - | e5 z for J < x < 2 . 

Solution From Eqn (7-19) we have 

/ = x 2 dx + \ e 5 * dx, 

Jo Ji + 

so that evaluating the integrals and then taking the appropriate limits gives 
7 = 1 + l( e io _ e »). 

Sometimes a more difficult situation than this arises in which either the 
integrand tends to infinity at some point in the interval of integration or, 
perhaps, the interval of integration itself is infinite in length. Such definite 
integrals are called improper integrals, and the way in which to attribute a 
value to any such integral is suggested by Eqn (7-19). 

Let us illustrate something of the difficulty that can arise if ideas are not 
made precise. Consider the integral 



/: 



dx 



Then since;' = l/.v 2 is essentially positive, the area under the curve must 
also be positive. Now if we apply the result of Problem 7-5 we have 



I 



1 dx_ _]_ J_ _ _ 
x 2 1—1 



SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 317 

which, since it is negative, contradicts our previous conclusion. What has 
gone wrong? The trouble is that \jx 2 tends to infinity as x -> 0, so that the 
arguments of Problem 7-5 are not applicable, for it was pre-supposed there 
that the interval of integration excluded the origin. When dealing with 
improper integrals of this type in which the integrand has an infinity within 
the interval of integration we shall assign a value to the integral according to 
the following definition. 

definition 7-4 (improper integral due to infinity of integrand) Let the 
function f(x) be continuous throughout the intervals a <Z x < c and c < x 
< b, and suppose that f(x) has a singularity at x = c in the sense that 
f{x) tends to infinity as x -> c. Then the integral of f(x) over the interval of 
integration [a, b] is said to be improper, and it is defined to have the value 

f(x)dx + lim f(x)dx, 

a (5--0 Jc + d 

whenever both limits involved exist. Under these circumstances the improper 
integral will be said to converge to the value /. When either of the limits does 
not exist, the integral will be said to be divergent. If the point c coincides with 
an end-point of the interval [a, b], then / is defined to be equal to the limit of 
the single integral for which the interval of integration lies within [a, b]. 

On the basis of this definition we are now able to determine the value to 
be attributed to the improper integral used as an illustration above. Let us do 
this in the form of an example. 

Example 7-6 Evaluate the improper integrals : 

Solution The integrand Ijx 2 tends to infinity as x — *■ 0, so that for case (a), 
when appealing to Definition 7-4, we need to make the identifications 
a = — 1, b = \, c = and/(x) = \jx 2 . Thus, 

,• t~° dx .• f ld * 
h = hm — + hm — • 

^o J-i x 2 g-o Js x 2 
Using the result of Problem 7-5 we find that 

/i = lim(-- 1 ) +lim(-l + t ) -* oo. 
e ^o \e / <5-0 V <v 

Thus the improper integral (a) is divergent. 

In case (b) the integrand is (x 2 + l)/x 2 , which again tends to infinity as 
x—*-0. However, in this case we must make the identifications a = — 1, 
b = 0, c = 0, and/(x) = 1 + l/x 2 , so that this time the singularity in the 
integrand occurs at the right-hand end-point of the interval of integration 



318 / FUNDAMENTALS OF INTEGRATION CH 7 



[—1, 0] (that is, at the upper limit of integration). 
It then follows from Definition 7-4 that 

which, from the results of Problems 7-2 (b) and 7-5, becomes 



<--M 



7 2 = lim (-£ + + I-- M/ — °°- 

Hence the improper integral (b) is also divergent. 

The one remaining form of improper integral requiring consideration 
occurs when the interval of integration is infinite. In these circumstances we 
shall assign a value to the integral according to the following definition. 

definition 7-5 (improper integral due to infinite interval of integration) 
Let the function fix) be continuous on the interval [a, co), then the integral 
of/(x) over the interval of integration [a, oo) is said to be improper, and it is 
defined to have the value 



f* 
h = lim f(x)dx, 

k->osJa 



whenever this limit exists. Under these circumstances the improper integral 
will be said to converge to the value h. When the limit does not exist, the 
integral will be said to be divergent. Similarly, if the interval of integration is 
(— oo, a], then when the limit exists, the improper integral of f(x) over the 
interval of integration (— oo, b] is defined to have the value 

h = lim f(x)dx. 

fc->oo J — k 

Symbolically, these improper integrals will be denoted, respectively, by 
I x = P/Wdx and h = \ f(x)dx. 

Ja J -co 

Example 7-7 Evaluate the improper integral 

f 00 dx 

Solution It follows at once from Definition 7-5 that 

. C k dx 
/ = l.m -, 

it-* oo J3 X" 

so that by virtue of the result of Problem 7-5, 



SEC 7-3 INTEGRAL INEQUALITIES / 319 

fc^oo \_k 3 J 3 

Hence this improper integral converges to the value 1/3. 

7-3 Integral inequalities 

A number of useful inequalities may be deduced concerning definite integrals, 
the simplest of which has already been stated inEqn(7T). Let us now derive 
our first result of this type, of which Eqn (7-1) represents a special case. 

Suppose that the definite integrals of/(x) and g(x) taken over the interval' 
[a, b] both exist. In brief, let us agree to say that/(X) and g(x) are integrable 
over the interval [a, b]. Now suppose that/(x) < g(x) for a < x < b. Then 
if P m is a partition of [a, b], we have from Theorem 7-2 that 

Cg(x)dx- Cf(x)dx = f (g(x)-f(x))dx 

Ja Ja Ja 

n 

.= Km 2(s(f<)-/(&))Af, (7-20) 

l|A|| Pm -0 i = \ 

where f « is some point in the fth sub-interval of length A« generated by the 
partition P m . Now since by hypothesis f(x) < g(x), it follows that f(h) < 
g(£i), so that the right-hand side of Eqn (7-20) must be non-negative. Thus 
we have proved the following theorem. 

theorem 7-4 (inequality between two definite integrals) Let/(x) <g(x) 
be two integrable functions over the interval [a, b]. Then, 



Cf(x)dx < f g(x)dx. 

Ja Ja 



Equation (7-1) follows as a trivial consequence of this result, for the 
theorem implies that if </>(x) </(*)< y>(x) are three integrable functions 
over the interval [a, b], then 



fb rb rb 

(f>(x)dx< f(x)dx< rp{x)dx. 

Ja Ja Jx 



Hence, if m, M are, respectively, the minimum and maximum values of f(x) 
on [a, b], our required result follows by setting </>(x) = m, y>(x) = M, when 
we obtain 

m(b - a) < f{x)dx ^ M(b - a). (7.21) 

Ja 

This last simple result implies a more important result which we now 
derive by appeal to the intermediate value theorem of Chapter 5. Writing 



320 / FUNDAMENTALS OF INTEGRATION 



CH 7 



inequality (7-21) in the form 



m <- 



— a Ja 



f(x)dx^M 



shows that the number 



— a Jn, 



f(x)dx 



is intermediate between m and M which are extreme values of the function 
f(x) itself. Hence, provided /(x) is continuous, it then follows from the inter- 
mediate value theorem that some number f exists, strictly between a and b, 
such that 



'«>-rbJ> )dJ 



(7-22) 



This result is called the first mean value theorem for integrals, and it 
constitutes our next theorem. 

theorem 7-5 (first mean value theorem for integrals) Let f(x) be con- 
tinuous on the interval [a, b], then there exists a number f, strictly between 
a and b, for which 

f f(x)dx = (b- a)f(i). 

Ja 
AF = Fix + h) - F{x) 

y =/w 



7> 



O a x x+ h b * 

Fig. 7-7 Area below y = fit) as a function of the upper limit of integration x. 

7-4 The definite integral as a function of its upper 
limit-indefinite integral 

If the lower limit of a definite integral is held constant, but the upper limit is 
replaced by the variable x, then the numerical value of the integral will clearly 
depend on x. Another way of describing this situation is if we say that a 
definite integral with a variable upper limit x defines a function of x. In Fig. 
7-7 this idea is illustrated in terms of areas, with the shaded region marked 



SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 321 

F(x) denoting the area below the curve y = f(t) which is bounded on the 
left by the line t = a, and on the right by the line t = x. 
In terms of the definite integral we have 

F(x) = fV(0dr. (7-23) 



Now let us suppose that f(t) is continuous in some interval [a, b], with 
a< x<b. Notice here that for the first time it is necessary to use the dummy 
variable t, because x and t are fulfilling two different roles inEqn (7-23). To be 
precise, x represents the upper limit of integration, whilst the dummy variable 
t represents the general variable in the interval of integration a<Lt<x. 
Consider the difference 



F(x + A) - F(x) = f + V(0d* ~ \ X f{t)dt 

Ja Ja 



Jx 



x + h 

f(t)dt. (7-24) 



Then the first mean value theorem for integrals allows us to rewrite Eqn (7-24) 
in the form 

F(x + h)- F(x) = hf(M\ (7-25) 

where x < f < x + h. 

Now, forming the difference quotient {F(x + h) — F{x)}jh, we find 

F(x + h)- F(x) _ 

h ~ Jkih 

so that taking the limit as h —*■ gives, 

r W = limJ F( ' + *>- fW }-/ W . (7.26) 

This important result shows that the integrand of integral (7-23) at the 
upper limit of integration / = x is equal to the derivative of F(x) with respect 
to x. 

Suppose now that G(x) is any function for which G'(x) = f(x). Then, 

G'(x) - F'(x) = £ [G(x) - F(x)} = 0, 

and so from Corollary 5T2 

G(x) = F(x) + constant. (7-27) 

Combining Eqns (7-23) and (7-27) shows that the most general function 
G(x) whose derivative is equal tof(x) must be of the form 

G(x) = f'/COdr + C, (7-28) 

Ja 

where C is a constant. 



322 / FUNDAMENTALS OF INTEGRATION CH 7 

The first term on the right-hand side of Eqn (7-28) is called an indefinite 
integral. The function G(x) itself is called either a primitive off or an anti- 
derivative of/. We shall usually use the name antiderivative, since this offers 
an accurate description of the process by which it is to be found. Namely, an 
antiderivative arises from the process of reversing the operation of differ- 
entiation, and the most frequent method of finding antiderivatives utilizes 
this idea by employing tables of derivatives in reverse. That is to say, by 
matching an integrand with an entry in a table of derivatives and thereby 
finding the functional form of G(x) apart from the additive arbitrary constant. 

Usually the antiderivative G(x) defined in either Eqn (7-27) or Eqn (7-28) 
is written symbolically in the form 

f f(x)dx = F(x) + C. (7-29) 

In this notation, the fact that an antiderivative is a function related to the 
operation of integration, and not just a number as in an ordinary definite 
integral, is indicated by again employing the integral sign, but this time without 
limits. On occasions the reader will find books in which an antiderivative is 
signified by the notation 



f 



f(x)dx, 



rather than the notation used in Eqn (7-29). 

The following short table lists a few of the antiderivatives which are of 
most frequent occurrence in mathematics. 



Table 7.1 



J/(x)dx = F(x) + C 




/(*) 


F{x) 


1 


a (const) 


ax 


2 


x n 


jfn+i 


n + I 


3 


fJ.X 


A 6 


4 


sin x 


— COS X 


5 


cos* 


sin x 



Other useful elementary antiderivatives that should be memorized, 
together with an account of systematic methods for finding antiderivatives, 
are given in the next chapter. 

Let us now return to Eqn (727) and notice that it follows from this that 



SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 323 

G(b) - G(a) = F(b) - F(a) = F(b) = f /(x)d.r. (7-30) 

Ja 

Hence we have proved that 

f(x)dx = G(b) - G(a), (7-31) 



Ja 



where G'(x) =f(x). This provides a method for the evaluation of definite 
integrals, for expressed in words it asserts that the definite integral off(x) taken 
over an interval [a, b] is the difference between the value of any antiderivative 
of f(x) at x = b and x = a. 

It is now time to express results (7-26) and (7-31) in the form of two basic 
theorems known, respectively, and the first and second fundamental theorems 
of calculus. 

theorem 7-6 (first fundamental theorem of calculus) lf/(x) is continuous 
for a< x<b, and 



F(x) = ["fiWt, 



then F'(x) = f(x) for all points x in [a, b]. 

Alternatively expressed, this result may also be written 

d^/> )d '= /W - 

theorem 7-7 (second fundamental theorem of calculus) If f(x) is con- 
tinuous for a < x < b and G{x) is any antiderivative of f(x), then 



f 

Ja 



f(t)dt = G(x) - G(a). 



The statement of Theorem 7-7 is often written in the form 



f f(x)dx = G(x)\lZl 

Ja 



with the understanding that 

G(x)\*z b a = G(b) - G(a). 

It follows from Theorem 7-7 that the definite integral calculated so 
laboriously in Example 71 may be evaluated directly by appeal to entry 
number 2 in Table 7- 1 . To see this set n = 2, so that f(x) = x 2 , then F(x) 
= x 3 /3, and by Theorem 7-7 we immediately deduce that 

f x 2 dx = K* 3 - a 3 ). 



324 / FUNDAMENTALS OF INTEGRATION 



CH 7 



The systematic employment of the fundamental theorems of calculus will 
be taken up in detail in Chapter 8, since our concern here is primarily with 
the theory rather than the practice of integration. 

Finally, to emphasize that the indefinite integral is a function, we now 
give an example of such an integral which defines an important mathematical 
function. Since we have the relationship 

d , l 

— loge x = -, for X > 0, 

ax x 

it follows from Theorem 7-7 that, provided a > 0, 



— = loge X — loge 0. 



Ja 

Hence, setting a = 1 gives the result 

C'dt 
log.* -J 7 

which is illustrated as the shaded area in Fig. 7-8. 



(7-32) 




Fig. 7-8 Natural logarithm represented as an area. 



7-5 Differentiation of an integral containing a 
parameter 

It can sometimes happen that an integrand, in addition to being a function of 
x, also depends on a parameter a. Furthermore, the upper and lower limits 
of the integral may themselves be functions of a so that the value of the 
integral must then itself depend on a. Our concern in this section will be with 
the differentiation, with respect to a, of an integral of the form 



/(a) = f{x, a)dx. 



(7-33) 



To derive the form of our result let us begin by assuming that <^(a), f(cn) 
are difFerentiable functions with respect to a in some interval c < a < d, 
and that/(x, a) is both integrable with respect to x on the interval [^(a), ^(a)] 



SEC 7-5 INTEGRAL CONTAINING A PARAMETER / 325 

and differentiable with respect to a. Then, first notice that from the mean 
value theorem for derivatives, in c < <x + h < d, we have 

<£(« + h) = <£(a) + ft ( -^ J , with oc < f < a + ft; 

y(a + ft) = y(a) + ft l-p) , with a < r? < a + ft; (7-34) 

/(jc, a + ft)=/(x, a) + ft(/| , witha< £<a + ft. 

The partial derivative notation is needed in the last of these results because 
for this application of the mean value theorem for derivatives we are regarding 
the variable x as a constant. 
Now we have 

fix, oc + h)dx, 

so that using results (7-34) we find 

/(oc + ft) = f{x, a + ft)dx + f(x, a + ft)dx 

•Mac) Jtf(a) 

+ f(x, a + ft)dx 

An application of the mean value theorem for integrals (Theorem 7-5) to 
the first and last terms then shows that 

/(a + ft) = ft (p^\ f(x', a + ft) + P fix, oc + h)dx 

where y(oc) < x < y(a) + Ay', <^(a) < x" < <£(a) + h<j>'. 

Next, forming the difference /(a + ft) — /(a), combining integrals and 
using the final result of (7-34) gives 

/(oc + ft) - /(oc) = ft (^) fix', oc + ft) + ft f * " ffi dx 

- ft (^) /(*", a + ft). (7-35) 

Finally, forming the difference quotient {/(a + h) — /(a)}/ft and taking 
the limit as h -*■ it follows that f , »j, and £ all tend to oc, whilst x' tends to • 
y(a) and x" tends to <£(a), whence 



326 / FUNDAMENTALS OF INTEGRATION CH 7 



d/ 

d 



- = (? W «) - (t)m> a) + I"*" %■ dx - ( 7 ' 36 > 

a \da/ \da/ J*(«) dx 



theorem 7-8 (differentiation of an integral containing a parameter) Let 
(f>(x), ^i(a) be differentiable functions with respect to a in some interval 
c < a < d, and let /(x, a) be both integrable with respect to x over the 
interval <f>(x) < x < ^(a) and differentiable with respect to a. Then, 

i r a> /(x, a)dx = (^ W a) - (^ W a) + P ^ dx. 
da J*(a) \da/ \da/ J#a) ox 

A useful special case of this arises when <^(a) = a and y(a) = b are con- 
stants, so that the only dependence on the parameter a is through the inte- 

d^ dw 

grand /(x, a). The terms — and — - are then identically zero, so that we 

da da 

arrive at the following corollary. 

Corollary 7.8 If /(x, a) is both integrable with respect to x over the interval 
[a, b] and differentiable with respect to a, then 



d C b C b 8f 

— f(x, a)dx = — dx. 

da Ja Ja OX 



Example 7-8 Apply the results of Theorem 7-8 to the following integral: 

|«3 + 2 sin 3a j„ 



1(a) = f 



+ coset X 2 + a 2 



Solution If we make the identifications <f>{x) = 1 + cos a, y)(x) = 3 + 

2 sin 3a, and /(x, a) = (x 2 + a 2 ) -1 , it then follows directly from Theorem 
7-8 that 

d/ 6 cos 3a sin a „ f 3 + 2sin3 « dx 



/•3 + 2sin3a 
a) 2 + a 2 Jl + cosa (.X 



da (3 + 2 sin 3a) 2 + a 2 (1 + cos a) 2 + a 2 Ji+cosa (x 2 + a 2 ) 2 



7-6 Othe/ geometrical applications of definite integrals 

This section offers a brief discussion of the application of the definite integral 
to the determination of arc length for plane curves, the surface area of a 
surface of revolution, and the volume of a volume of revolution. Each result 
will be derived by appeal to the basic definition of a definite integral, since it 
will first be necessary to define the precise meaning of the concepts that are 
involved. 



SEC 7-6 



OTHER GEOMETRICAL APPLICATIONS / 327 




O a = Xo Xl X2 Xn-1 X„ = b 

(a) (b) 

Fig. 7-9 (a) Arc length of curve; (b) element of arc length. 



(a) Arc length of a plane curve 

Consider the plane curve V with the equation y — f(x) illustrated in 
Fig. 7-9 (a). Then our task here will be first to define the meaning of the 
length s of the arc MN, and then to deduce a method by which it may be 
found once the equation of T has been given. Let go, gi, ■ ■■, Qn represent 
any set of points on T, the first of which coincides with the left-hand end- 
point M, and the last of which coincides with the right-hand end-point N. 
Then if A.s t denotes the length of the chord joining g 4 -i to Q ( , the length S n 
of the polygonal line joining M to N is 

n 
i = l 

Now the projection of the set of points Qo, Qi, . . ., Qn onto the x-axis 
defines a set of points a = xq < *i < . . . <x n = b which form a partition 
P n of the interval [a, b]. Thus, denoting the norm of P n by || A \\ Pn , we shall 
define the length s of the arc T from M to N to be 



lim 2 A 5 *- 

n A iip„-*° f=1 



(7-37) 



Now, setting A< = xt — x«_i and 6i —f(xi) —f(Xi-i), it follows directly 
by an application of Pythagoras' theorem (Fig. 7-9 (b)) that 



m 



a^ = vW-r-<^)= m + 

However, by virtue of the mean value theorem for derivatives we may write, 
provided that/(;c) is differentiable on [a, b], 



Si f(xi) -f(x t -i) 



Xi 



Xi-l 



= /'(&), 



where xt~i < f t < xu and so 
&s ( = V(l + [/'(&)] 2 ) A«. 



(7-38) 



328 / FUNDAMENTALS OF INTEGRATION 



CH 7 



Thus the desired arc length s will be determined by evaluating 
s = lim J V(l + ifiidf) A ( . 

||A||p B -Ot = l 



(7-39) 



We see from Definition 7-2 that this is simply the definite integral of the 
function \/(l + [f'(x)] 2 ) integrated from x = a to x = b, and hence 

* = f VO + [/'(*)] 2 )d* = f III + f^fW (7-40) 



theorem 7-10 (arc length of plane curve) Let y =f(x) be a differentiate 
function on the interval [a, b\. Then the length j of the plane curve Y defined 
by the graph of this function in the (x, jO-plane between the points (a,f(aj), 
(b,f(b)) is given by 



Example 7-9 Determine the length of arc of the curve y = cosh x between the 
points (1, cosh 1) and (3, cosh 3). 

Solution We have a = 1, b = 3, y = cosh x, and so dy/dx = sinh x, 
whence 



-r 



V(l + sinh 2 x) Ax-- 



f 3 

cosh x 



dx. 



Now since d/dx (sinh x) = cosh x, it follows that sinh x + C is an anti- 
derivative of cosh x, so that by Theorem 7-7 we have 

s = cosh x dx = (sinh x + C~)\\ = sinh 3 — sinh 1. 



y = v(<) 



O a a x — 0( f ) /J 

Fig. 7-10 Length of parametrically defined curve r. 



B(r = 7i) 




SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 329 

Theorem 7-10 will fail for curves T of the type shown in Fig. 7-10, for 
any representation of the function in the form y =f(x) will not be single 
valued on the interval [a, /?], and so it will not be differentiate there. 

The difficulty here is easily overcome by using the fact that each point on 
the curve T can be uniquely defined and a unique derivative assigned if the 
curve r is capable of parametric representation in the form 

* = <£(/), y = y>(0 for T <t<Ti, (7-41) 

with 4>(t), f(t) differentiable on [T , T{]. 

Using the result for parametric differentiation 

J w dx f (0 

in Eqn (7-39), and then employing the differential relationship A* = </>'(t)A.t 
to define A< in terms of At, we find that 

s= lim I /(i + [ffi] V(f,)Af, (7-42) 

where u~\ < f * < U. 

Thereafter, the argument that gave rise to Eqn (7-40), now gives rise to 

3 = loV ( l + ill)] l ^^ = f V(tf {t)? + WitW) ^ (7 ' 43) 

theorem 7-11 (arc length of parametrically defined curve) Let <f>(t), y>(t) 
be differentiable functions in T < t < T\. Then the length s of the plane 
curve defined parametrically by x = <f>(t), y = \p(i) between the points 
(<f>(T ), y>(T )), (<f>(Ti), v(ri)) is given by 



J'Ti 
V([«A' 



(tw + mmdt. 

(b) Area of surface of revolution 

The name surface of revolution is given to any surface which is generated by 
rotating a plane curve y = f(x) about either the x-axis or the /-axis. Since 
the determination of the area in either case is exactly similar, we shall discuss 
only the case of the revolution of the curve y —f(x) about the x-axis, as 
shown in Fig. 7-11. 

A problem arises here as to how to define the area of a non-cylindrical 
curved surface. We propose to approach the problem by sectioning the surface 
into annular strips of width A< as shown in Fig. 7-11, and then to approximate 
the area AS of each such annular strip by representing it by the conical area 
which is obtained by rotating the chord PQ of length Ast about the x-axis. 
Then if this element of area of cone between the planes x — x«-i and x = xt 
is ASt, this will be given by 



330 / FUNDAMENTALS OF INTEGRATION 



CH 7 



A*^_Q 




Fig. 7-11 Area of surface of revolution. 



AS, = 2n( y -^±Il) A Si . 



(7-44) 



Similar elements of area may be defined for each of the other annular 
strips defined by some partition P„ of the interval [a, b] by the set of points 
a = xo < xi < ■ ■ • < x n = b. Thus, denoting the norm of P n by || A|| Pn , 
we shall define the area S of the surface of revolution generated by rotating 
y = fix) about the x-axis, and contained between the planes x = a and 
x — b, to be 

n n 

S= lim 2AS*= lim rr £ iyi-i + yd A*. (7-45) 

||A|| Pb -*0 i = l HAIIp„-0 i = l 

Hence, if fix) is differentiable in a < x < b, by using result (7-38) we find 



S = lim ttJ 0,_i + _y, V(l + [/'(&)] 2 ) A«, 

||A||P„^0 t=l 



(7-46) 



where Xi-i < & < x«. 

Once again our previous form of argument shows that this is just the 
definite integral of the function 27t/(x)\/(1 + [fix)] 2 ) integrated from x = a 
to x = ft, and so 



S = 2n \ b fixWi\ + [f'(x)f) dx. 

Ja 



(7-47) 



SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 331 

theorem 7-12 (area of surface of revolution) Let/(x) be a differentiable 
function on a < x < b. Then the area 5" of the surface of revolution generated 
by rotating the graph of the function / = f{x) about the x-axis, and contained 
between the planes x = a and x = A is given by 

S = 2n ff(xW(l + [f(xW)dx. 

Ja 

Example 7-10 Find the area contained between the planes x = — 1 and 
x = 2 of the surface of revolution about the x-axis of the curve y = cosh x. 

Solution We have a = — 1 , b = 2, and/(x) = cosh x, and so/'(x) = sinh x, 
whence 

S = 2tt \ cosh x-\/(l + sinh 2 x) dx = 2tt I cosh 2 x dx. 

To evaluate this result we now use the hyperbolic identity cosh 2 x = }(1 + 
cosh 2x) to obtain 

S = 77- (1 + cosh 2x)dx. 

Then, as it is easily verified that \ sinh 2x + C is an antiderivative of cosh 2x, 
we have from Theorem 7-7 that 



12 
-I 



S = 7T (1 + cosh 2x)dx = tt(x + \ sinh 2x + C)| 2 

= |w(6 + sinh 4 + sinh 2). 

(c) Volume of revolution 

Finally, let us determine the volume of revolution V of the volume shown in 
Fig. 7-11. This time, to define the volume of such a figure, we consider 
cylindrical elements of volume of thickness A«, and place upper and lower 
bounds on that element of volume by the obvious inequality : 

77 x (least radius of annulus) 2 x Aj< element of volume < 

tt x (greatest radius of annulus) 2 x Aj. 

Then, if xi-i < |« < x«, a volume element A Vi satisfying this inequality and 
bounded to the left by the plane x = xi-i. and to the right by the plane x = xt 
is 

AK« = ^[/(fOPAi. (7-48) 

The volume of revolution generated by rotating y =f(x) about the x-axis, 
and contained between the planes x = a and x = b will then be defined to be 

V= lim Trif/dOPA*. (7-49) 

||A||p B ^0 i = l 



332 / FUNDAMENTALS OF INTEGRATION. CH 7 

A repetition of the previous form of argument then yields 

V = 77 f [/(x)] 2 dx. (7-50) 

Ja 

Notice that we have imposed no differentiability requirements on f(x), 
so that result (7-50) is applicable even if/(x) is only piecewise continuous. 

theorem 7-13 (volume of solid of revolution) Let /(x) be a piecewise 
continuous function on a < x < b. Then the volume of the solid of revolu- 
tion generated by rotating the curve y = f(x) about the x-axis, and contained 
between the planes x = a and x = b, is given by 

V=TT f [fixWdx. 
Ja 

Example 7-11 Determine the volume of revolution generated by rotating the 
parabola y = 1 + x 2 about the x-axis, and contained between the planes 
x = 1 and x = 2. 

Solution Here we have a = 1, b = 2, and/(x) = 1 + x 2 , so that 

V = 77 f (1 + x 2 ) 2 dx = 77 f (1 + 2x 2 + x*)dx 

( 2x 3 x 5 \ 



= 77 \X + 



2 



_ 17877 

i ~ "TJ" 



7-7 Numerical integration 

From the second fundamental theorem of calculus we have seen that the 
successful analytical evaluation of a definite integral involves the deter- 
mination of an antiderivative of the integrand. Although in many practical 
cases of importance an antiderivative can be found, the fact remains that in 
general this is not possible and Theorem 7-7 is therefore of no avail. Such, 
for example, is the case with an integral as simple as 

e-* 2 dx, 



f 



for although an antiderivative of e~* 2 certainly exists on theoretical grounds, 
it is not expressible in terms of elementary functions. 

Of the many possible methods whereby a numerical estimate of the value 
of a definite integral may be made, we choose to mention only the very 
simplest ones here. The general process of evaluating a definite integral by 
numerical means will be referred to as numerical integration, though the old 
fashioned term numerical quadrature is still often employed for such a 
process. The matter of the accuracy of these methods will be taken up 



SEC 7-7 



NUMERICAL INTEGRATION / 333 



elsewhere in connection with applications of Taylor's theorem. 



O 




-►* 



a = Xo Xl Xl X»-l x n = 

Fig. 712 Trapezoidal approximation of area. 

(a) Trapezoidal rule 

Although a strictly analytical derivation of the so called trapezoidal rule 
for integration may be given we shall not use this approach, and instead 
make appeal to the area representation of a definite integral. Consider Fig. 
7-12, and let us estimate the shaded area below the curve y = f(x) which we 
know has the value 



f 

Ja 



f(x)dx. 



Let us begin by taking any set of n + 1 points a = xo < xi < • ■ • 
< x n = b, and on each interval [xt-i, xt], approximate the true area above 
it by the trapezium obtained by replacing the arc of the curve through the 
points (xi-i,/(x«-i)), (xuf(xi)) by the chord joining these two points. 

Then the area of the trapezium on the interval [xt-i, xi\ is 

Uf(xi-i) +/(*<)) A*, 

where Aa-j = xt — xt-i. 

Thus, adding the n contributions of this type, we arrive at the general 
trapezoidal rule 

f(x)dx *< M/(*o) +/(xi))Axi + K/(*i) +/(x 2 ))Ax 2 + • • • 



Ja 



+ i (f(Xn-l) + f(x n )) Ax„. (7-51) 

If the interval [a, b] is divided into n equal parts of length h = (b — a)jn, 
then (7-51) becomes the trapezoidal rule for equal intervals 

^^dx = h[lf(x ) +f( Xl ) +/(X 2 ) + • • • +/(*„_!) 

+ kf(xn)] + e(h), (7-52) 



f 

Ja 



334 / FUNDAMENTALS OF INTEGRATION CH 7 

where an equality sign has now been used because we have included the 
error term e(h), which recognizes that the error is, in part, dependent on the 
magnitude of h. 

(b) Simpson's rule 

A different approach involves dividing [a, b] into an even number n of sub- 
intervals of equal length h = {b — a)jn, and then approximating the function 
over consecutive pairs of sub-intervals by a quadratic polynomial. That is to 
say fitting a parabola to the three points (a,f(a)), (a + h,f(a + h)), 
(a + 2h, f(a + 2h)) comprising the first two sub-intervals, and thereafter 
repeating the process until the whole of the interval [a, b] has been covered. 
The value of the definite integral can then be estimated by integrating the 
successive quadratic approximations over their respective intervals of length 
2h and adding the results. This simple idea leads to Simpson's rule for 
numerical integration which we now formulate in analytical terms. 

Consider the first interval [a, a + 2h], and represent the function y = f(x) 
in this interval by the quadratic 

y = co + cix + c 2 x 2 . (7-53) 

Then the approximation to the desired integral taken over this interval is 

f(x)dx s» (c + cix + c 2 x 2 )dx 

a Ja 

a + 2h 



"( 



C0X+ __+_ 



(7-54) 



To determine the coefficients Co, ci, and c% in order that the quadratic should 
pass through the three points (a,f(a)), (a + h,f(a + h)), {a + 2h,f(a + 2h)) 
we must solve the three simultaneous equations 

f(a) = co + c x a + c 2 a 2 , 
f(a + h) = c + ci(a + h) + cv{a + h)\ 
f(a + 2/r) = co + ci(a + 2h) + c 2 (a + 2h) 2 . (7-55) 

When this is done and the results are substituted into Eqn (7-54) we arrive at 
the desired result 



f 

Ja 



f(x)dx = - (f(a) + Af{a + h) + f(a + 2A)) + e(h), (7-56) 



where again we have included the error term by e(h). In its simplest form 
Eqn (7-56), together with its error term, is called Simpson's rule. An explicit 
form for e(h) in both the trapezium rule and Simpson's rule will be given 
later. 

If, now, result (7-56) is applied to the intervals [a, a + 2h], [a + 2h, 
a + 4h], . . ., [a + (« — 2)h, b] and the results are added, we arrive at 



SEC 7-7 NUMERICAL INTEGRATION / 335 

Simpson's rule for an even number n of intervals 

f f(x)dx = \ [f(a) + 4/(a + h) + Ifia + 2h) + 4f(a + 3h) + ■ ■ • 

+ Af{a + (« - \)h) + fib)] + eih), (7-57) 
where h = ib — a)\n. 



Example 712 Calculate the definite integral 

,2 dx 
x 



-r 



by the trapezoidal rule and by Simpson's rule, taking ten integration steps of 
length h = 01. 

Solution We start by tabulating the functional values of the integrand l/x 
at intervals of 01. 



X 


1 

X 


10 


10000 


11 


0-9091 


1-2 


0-8333 


1-3 


0-7692 


1-4 


0-7143 


1-5 


0-6667 


1-6 


0-6250 


1-7 


0-5882 


1-8 


0-5556 


1-9 


0-5263 


20 


0-5000 



Then, using the trapezoidal rule (7-52), we find 

7^01 x [0-5000 + 0-9091 + 0-8333 + 0-7692 + 0-7143 + 0-6667 
+ 0-6250 + 0-5882 + 0-5556 + 0-5263 + 0-25], 
whence / ^ 0-6938. 

The same calculation using Simpson's rule, (7-57), gives 

/ & -^- X [1-0000 + 4 X (0-9091) + 2 X (0-8333) + 4 x (0-7692) 

+ 2 x (0-7143) + 4 x (0-6667) + 2 x (0-6250) + 4 x (0-5882) 
+ 2 x (0-5556) + 4 x (0-5263) + 0-5000], 
whence I «» 0-6932. 



336 / FUNDAMENTALS OF INTEGRATION CH 7 

In actual fact the exact result of this definite integral is log e 2 = 0-69315. 
As would have been expected on intuitive grounds, Simpson's rule is more 
accurate than the trapezoidal rule. 

(c) Integration of interpolating polynomials 

A direct extension of the previous method that may be exploited system- 
atically to produce integration formulae of high accuracy and flexibility 
involves the replacement of the function y = f(x) over the interval [a, b] by 
an interpolating polynomial of degree n. Thus, on the interval [a, b], the 
function y = f(x) is represented by 

y = Co + C!X + C2X 2 + • • • + c n x n , (7-58) 

and the numerical integration formula then follows by writing 

J'b rb 

f(x)dx ^ (c + cix + c 2 x 2 + • • • + c n x n )dx. (7-59) 

a Ja 

Thus, if the error term is again represented by e(h), we obtain the numerical 
integration formula 

rb 



f 

Ja 



f(x)dx = c (b - a) + °-± (b* - «2) + | (63 _ fl 3) + 



+ —-: (b n+1 - a» +1 ) + e(h). (7-60) 



The difficulty in this approach arises from the fact that the sense in which 
Eqn (7-58) is to approximate y = f{x) is still to be defined, and this will 
influence both the method by which the n + 1 coefficients Co, c\, . . ., c n 
are to be determined and, naturally, the error term e(h). 

Probably the simplest choice of approximating polynomial, and the only 
one to be discussed here, is determined by the requirement that the poly- 
nomial and the function should have identical values at it + 1 points 
xo < xi < • • • < x n belonging to [a, b]. That is, the requirement that the 
graph of Eqn (7-58) should pass through the n + 1 points (xo,/(*o)), (xi,f(xij), 
• • ■, (x n ,f(xn))- Such a polynomial is called a Lagrangian interpolation 
polynomial, and its form may be written down directly as follows. We illus- 
trate the Lagrangian interpolation polynomial Ls(x) of degree 3, which 
passes through the four points (xo,f(x )), (xi,/(xi)), (x2,f(x s )), and 
(xs, /(xs)). Higher degree polynomials may be constructed in a similar 
manner. 

(x- xi)(x - x 2 )(x - x 3 ) 
U(x) = -f(x ) 

(Xo ~ Xl)(*o — *2)(-X0 — *3) 

(X - X )(X - X2)(X — X 3 ) .. . 

+ 7 ^7 ^7 ;/Oi) 

(xi — xo)(*i — X2)0:i — x 3 ) 



PROBLEMS / 337 



(x - Xp)(x - Xi)(x -X 3 ) 

(x 2 — X )(X2 ~~ *l)(*2 — X3) 

+ (^-^-^-X2) (7 . 61) 

(X 3 — X )(X3 — Xl)(X3 — X 2 ) 

This form of approach to the development of an integration formula is 
essential when, as is often the case, the function/(x) is only known in tabular 
from. 



Example 7-13 Given the following tabular values of a function /(x), derive 
the Lagrangian interpolation formula Z-3(x) for/(x). 



r 


Xr 


/(*') 





2 


2131 


1 


4 


1-242 


2 


6 


4-507 


3 


7 


9-702 



Solution It follows by direct substitution into Eqn (7-61) that 

(x - 2)(x - 4)(x - 7) 
+ (4X2X-I) * (4 ' 507> 

(x - 2)(x - 4)(x - 6) 

Simplification of this will yield the required third degree polynomial 
which may, if desired, then be integrated over any sub-interval of the interval 
[2, 7] on which /(x) is defined, thereby yielding an approximation to the 
definite integral of/(x) integrated over that same sub-interval. 



PROBLEMS 

Section 71 

71 Let f(x) = ).x on some closed interval a < x < b lying in the positive part of 
the x-axis, where / > is a constant. Then, if P n is a partition of [a, b] into n 
sub-intervals of equal length, determine the form of the lower and upper sums 



338 / FUNDAMENTALS OF INTEGRATION CH 7 



§i\; Sp n for /(*) taken over this partition and prove directly by taking the 
limit that 



lim S Pn = lim §p n . 



Hence deduce that 



Ja 



b j 

Ax dx = - (6 2 - a 2 ). 



7-2 Let A, /* > be constants, and set /(x) = /i + Ax on some closed interval 
a < x < b lying in the positive part of the x-axis. Show, using the method of 
Problem 71, that 

6 A 

(// + Ax) dx = ,i(b - a) + - (b 2 - a 2 ). (A) 



Ja 



Show also by this method that 

Cb 

ftdx= n{b - a), (B) 



f 

Ja 



Ja 

and deduce from (A), (B) and the result of Problem 71 that 



rb pb rb 

{n + Ax) dx = I ji dx + I Ax dx. 

Ja Ja Ja 



This provides a direct proof of the linearity of the operation of integration in 
the special case that f(x) = ft + Xx. 

7-3 Let /"(x) = e Xx , and take P n to be a partition of the closed interval [a, b] into 
n sub-intervals of equal length. By taking the numbers |< of Definition 7-1 to 
be at the left-hand end points of the sub-intervals, compute the approximating 
sum Sp n corresponding to/(x) = e Ax , and by finding its limit prove that 



Ja 



b i 

t Xx dx =- (e Xb - e Aa ). 



7-4 If a < k < b, use the result of Problem 7-3 to deduce that 

Cb fk rb 



J'b flc pb 

e* x dx = e A * dx + e*" dx. 
a Ja Jk 



This provides a direct proof that the operation of integration is additive with 
respect to the interval of integration in the special case that/(x) = s Xx . 

7-5 Let [a, b] be any closed interval not containing the origin, and denote by P m 
the partition of this interval into m equal sub-intervals each of length (b — d)\m. 
Denote by x r the point x r = a + (rjm){b — a) lying at the right-hand end point 
of the rth interval. Then, by setting f r = V(x r -ix r ) show, by considering 
x r -i — S r and x r — fr, that x r -i < IV < x r +i. By writing /(x) = 1/x 2 in 
Definition 7-2; and taking P m and the points f r in that definition to be as 
defined above, prove that 



r b dx = /l _ 1\ 



PROBLEMS / 339 



n j „ / 1 W i l \ 

Hint: Use the fact that J, = 1 Z Z~, )\z~~T) 

, = 1 Xr-lXr r ~i \ x r ~ x r-lf \Xr-l X r J 

7-6 Determine the lower bounds m r and the upper bounds M r of the function 
f(x) = 1/(1 + x 2 ) in each of the n adjacent sub-intervals of length l/« com- 
prising a partition P„ of the closed interval [0,1]. Use these results to deduce 
the form taken by the upper and lower sums Sp n , Sp n and show that 

lim (Sp H - S Pn ) = 0. 

n— *ao 

Deduce from this that 



I 



hm n { „ , .„ + „ , „„ + a . , a + • 



1 + x 2 „" «, |« 2 + l 2 « 2 + 2 2 n 2 + 3 2 n 2 + « s 

or, equivalently, 

, 1 1,1, 1 

= hm n {— + „ , ,„ + „ , „ + 



n 2 n 2 + l 2 n 2 + 2 2 n 2 + (n - l) 2 

We shall see later that this integral has the value \ir, and so each of these 
different expressions has this same interesting limit. 

Section 7-2 

7-7 Outline the proofs of the results of Theorem 7-3. 

7-8 If f(x) = 2x - 3, use result (A) of Problem 7-2 to evaluate the definite 
integral 



I 



(2x - 3)dx. 



Rewrite this as the sum of two definite integrals each with a non-negative inte- 
grand and verify that their sum leads to the same result. 

7-9 Use the result of Problem 7-3 to evaluate the definite integral 

r2 



I 



e~ 3x dx. 

'4 



7-10 Find the area / between the curves y = x 2 + 2 and y = — x + 1, which is 
bounded to the left by the line x = — 1 and to the right by the line x = 2. 

7-11 Discuss, without attempting to evaluate any integrals that are involved, the 
problem of determining the area between the curves y = 1 + sin x and y = 1 
+ cos x which is bounded to the left by the line y = and to the right by 
the line j = 2*-. 

7-12 Find the area / between the two curves y = 1/x 2 and y = e 05x — 3, which is 
bounded to the left by the line x = 1 and to the right by the line x = 2. 

7-13 Evaluate the integral 
"*/W dx, 



-f 



340 / FUNDAMENTALS OF INTEGpATION CH 7 

given that 

Ix for < x < 1 ; 

/(*)= 2 + 2x for 1 <x<2; 

U - 1 for 2 < x < 3. 

714 On the assumption that the definite integral 

r b dx 



f 

Ja 



= arcsin b — arcsin a, 



Vd - x*) 
prove that the improper integral 

Jo V(l - * 2 ) 

is convergent, and determine its value. 

7-15 Sketch the area bounded below by the positive x-axis, and above by the line 
y = x on the interval < x < 1, and by the curve y = l/x 2 on the interval 
1 < x < co . Determine this area / by the use of an improper integral combined 
with elementary geometrical arguments. 

Section 7-3 

7-16 Use Theorem 7-4 to place bounds on the value of the definite integral 

/ = I e - * 2 cos 3 x dx. 



= ' 



7-17 Evaluate the definite integral 
x 2 dx, 



1 



and use the result to determine the number g in Theorem 7'5 when it is applied 
to this definite integral. Is the number I unique? Repeat the argument, but 
this time applying it to the definite integral 



i 



2 

x 2 dx. 

2 



Is there a unique number f in this case? 

7-18 Prove the following result which is a restricted form of the second mean value 
theorem for integrals. Let f(x) > be continuous and monotonic decreasing 
on [a, b], and let^(;t) > be continuous on [a, b]. Then, 



f f(x)g(x)dx=f(.a) [ g(.x)dx, 

Ja Ja 



where a < f < b. State the corresponding form of the theorem when/(x) > 
is continuous and monotonic increasing on [a, b]. [Hint: Consider the inte- 
grand f(a){f{x)g{x)lf{a)} and use Theorem 7-4.] 

7-19 The requirement of continuity for/(jc) in Theorem 7-5 is essential, for without 



PROBLEMS / 341 



it the result of the theorem may, or may not, be true. Illustrate this by con- 
sidering step functions /(x) defined on the interval [1, 4], and show that it is 
possible to define ones for which, 

(a) no number f exists which satisfies Theorem 7-5; 

(b) an infinity of numbers f exist satisfying Theorem 7-5. 



Section 7-4 

7-20 Use Theorem 7-7 to evaluate the following definite integrals: 

(a) (x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) sin x dx, 



rb f" r%« 

(x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) si 

Ja Jo Jo 

f 

Jo 



(d) | sin x | dx. 

Jo 

7-21 Use Theorem 7-7 to determine the area contained between the x-axis and the 
curve y = 1 + x 3 + 2 sin x, which is bounded to the left by the line x = 
and to the right by the line x = ■*. 

7-22 Using the basic properties of the logarithmic function listed in Section 6-3, 
express logo x in terms of an indefinite integral, and sketch the interpreta- 
tion of the result as an area below a curve. 



Section 7-5 

7-23 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate 
the result : 



Ja 



l + o 2 

t~ x cos ax dx. 



7-24 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate 
the result: 



j* 00 

1(a) = x"-^-* dx, (a > 0). 



7-25 This problem outlines an alternative form of proof for the result of Theorem 
7-8. It is based on the chain rule for differentiation and on a direct proof of 
Corollary 7-8. Define the function F(a, #a), v( a )) by the equation 



F(a,<£(a),H«))= I "A*,«)d*. 



Then it follows from the chain rule for differentiation that the derivative of 
the integral with respect to a is given by 

dF_dF8Fdy>8Fdf 
da da. dy) da 8<j> da 

Use Definition 7-3 together with the first fundamental theorem of calculus 
to prove that 



342/ FUNDAMENTALS OF INTEGRATION CH 7 



8F 8F 

— =/[y(°e), oc] and — = -/fo(a), a]. (B) 

Finally, obtain the statement of Theorem 7-8 by substituting results (B) into 
(A) and giving a direct proof, as in the text, that 



8 p rv if 

— f(x, x)dx = ir-Ax, 



where for the purposes of partial differentiation with respect to a, the limits <t>, y> 
are to be regarded as constants. 



Section 7-6 

7-26 Express in terms of a definite integral the arc length of the curve y = 1 + x 2 
+ sin 2x, that lies between the points on the curve corresponding to x = 1 
and x = 4. 

7-27 Prove that the circumference of a circle of radius a is 2-na by using the para- 
metric equations of a circle x = a cos t, y = a sin / with < / < !■*. 

7-28 Find the area contained between the planes x = — 2 and x = 3 of the surface 
of revolution about the x-axis generated by the curve y = 2 + cosh x. 
[Hint: An antiderivative of cosh x is sinh x + C] 

7-29 If the curve j = f(x) has an inverse x = <f>(y), state the form taken by Theorem 
7-12 when the curve y = f(x) between the points (a,/(a)) and (b,f(b)) is 
rotated about the y-axis. 

7-30 Determine the volume contained between the parabola y = 2 + x + x 2 and 
the cubic y = 5 + 2x + x 3 , which lies between the planes x = 1 and x = 2. 

7-31 If the curve j = f(x) has an inverse x = <t>(y), state the form taken by Theorem 
7- 13 when the curve y = f{x) between the points (a,f(a)) and (b,f(b)) is 
rotated about the y-axis. 



Section 7-7 

7-32 Evaluate the definite integral 

"3 

(x 3 + 2x+ \)dx 



f 



by the trapezoidal rule using four intervals of equal length and then by 
Simpson's rule for the same intervals. Compare the result with that obtained 
by direct integration. Infer from your result that Simpson's rule is exact for 
cubic equations despite the fact that it is based on a parabolic fitting of the 
function. 

7-33 State the form of the Lagrangian interpolation formula L2OO, and use it to 
deduce Simpson's rule, (7-56), by applying it to the three points (a,f(a)), 
(a + h,f(a + h)) and (a + 2h,f(a + 2 h)) through which the function y = 
f(x) passes. 



PROBLEMS / 343 

7-34 Let the curve r be defined in terms of the polar coordinates (r, 6) by means 
of the equation 

where /( 6) is a continuous function. Then if P n is a partition of the interval 
a < 9 < ft into the points a = O < 9i < • ■ ■ < 0« = /? with the norm 
II A IUv prove that the area A between the origin and the curve r which is 
bounded by the radius vectors = a and 6 = ft is given by 

A= lim 2 iPdOA,, 

where 0»-i < Si < Qt and X = 0; — 0,-1. Hence deduce that 

f(3 



-r 



/ 2 (0)d0. 



Use this result to find the area swept out by the radius vector drawn from 
the origin to the Archimedian spiral r = k e between the radius vectors = a 
and 6 = ft, with ft > a. 



7-35 Consider a straight rod of length L which has a uniform cross-sectional area. 
Aligning the *-axis with the rod in such a manner that the origin coincides 
with the left-hand end point, assume that the mass M(x) of material contained 
in the rod in the interval [0, x] is given by 



T 

Jo 



M{x) = P (t) At. 
Jo 

Then the essentially non-negative function p(x) is called the linear density 
distribution of the matter in the rod, and by the first fundamental theorem of 
calculus it follows that p(x) = M'(x). 

Now in mechanics the moment of inertia I about an axis of a point mass m 
situated at a perpendicular distance x from that axis is defined to be mx % . By 
considering a partition P n of < x < L into the points = xo < xi < ■ ■ • 
< x n = L with the norm 1 1 A 1 1 Pn , prove that the moment of inertia / of the 
rod about an axis perpendicular to the rod and passing through an end point 
is given by 

n 

/= lim 2 fiMff)Ai, 

||A||p B -0 i = l 

where xt-i < I; < xt and A, = x t — x,-i. Hence deduce that 
x 2 p(x) dx. 



Jo 



In the case of a rod of mass M having a uniform linear density p(x) = p , 
deduce the relationship between pq and M and use it to prove that the moment 
of inertia of the rod about an axis perpendicular to its length and passing 
through an end point is 



ML 2 



344 / FUNDAMENTALS OF INTEGRATION CH 7 



7-36 Consider a circular disk of radius a, and suppose that the mass M{r) of material 
contained within a circle of radius r drawn about its centre is given by 

M{t) = 2t, t P (t)dt. 



-H 



Then the essentially non-negative function />(r) is called the area density 
distribution of the matter in the disk, and by the first fundamental theorem of 
calculus it follows that 2nrp(r) = M'(r). 

Use the form of argument outlined in the previous problem to prove that 
the moment of inertia / of the disk about an axis perpendicular to its plane 
and passing through its centre is given by 



'= 2n\ 

Jo 



r 3 p(r) dr. 



If the disk is of mass M and has a uniform area density p(r) = po, deduce the 
relationship between />o and M and use it to prove that the moment of inertia 
of the disk about an axis perpendicular to its plane and passing through its 
centre is 

Ma* 
2 ' 

7-37 Indicate by means of simple examples how the integral inequality (71) may 
be used to place upper and lower bounds on the integrals denning the area A 
and the moment of inertia / in Problems 7'34 to 7-36. 



Systematic integration 



8-1 Integration of elementary functions 

The main objective of this chapter is to explore some of the systematic 
methods for determining an antiderivative, that is, a function F(x) whose 
derivative is equal to some given function f(x). As described in the previous 
chapter, we shall denote the antiderivative of the function /by jf(x)dx with 
the understanding that 

J/(x)d* = F(x) + C (8-1) 

with C an arbitrary constant. 

Alternatively, as any indefinite integral of/ must also be an antiderivative 

of/, we may identify F(x) in Eqn (8-1) with f{t)dt where a is arbitrary, to 

Ja 

obtain the equivalent expression 

jf(x)dx = f f(t)dt + C. (8-2) 

Remember that the symbol §f(x)dx for the antiderivative of/derives from 
differentiation and denotes the most general function whose derivative is/ 

C 
The allied symbol f(x)dx, denoting a definite integral of/ derives from 

Ja 

integration and is simply a real number. Considering the definition of an 
antiderivative, we shall say that two antiderivatives are equal if they only 
differ by a constant. 

It should be recalled that the connection between the concepts of an 
antiderivative and a definite integral is provided by the fundamental theorem 
of calculus, which asserts that 

Jf/tod* = { jV« d *) z _ b ~ [\f {x)dx ] 
In view of Eqn (8.1) this may be written 

b 

f(x)dx = F(b) - F{a). (8-3) 



Ja 



Very often in texts the term indefinite integral is loosely ascribed to the 
entire right-hand side of Eqn (8-2) instead of, as here, only to its first term. 
This is usually justified by the fact that a is arbitrary though, of course, it 



/■ 

Ja 



346 / SYSTEMATIC INTEGRATION CH 8 

does not necessarily follow that all possible constants C can be absorbed into 
the integral by a suitable choice of a. For example, we have the antiderivative 

J cos xdx = sin x + C, 

though if for some particular problem it was appropriate to set C = 3, say, 
then no choice of the arbitrary constant a would enable us to equate 

cos xdx and sin x + 3, for this would imply that sin a = — 3. 

Unfortunately, the theorems for the differentiation of wide classes of 
functions seldom have any counterpart for determining antiderivatives. 
Ultimately, success in finding an antiderivative depends on whether or not 
the function/can be so simplified that one may be recognized by using tables 
of derivatives in reverse : that is, matching the desired derivative / with one 
in the table, and reading backwards to deduce an antiderivative. Thus, to 
find the antiderivative of 3 sec x tan x, we first glean from Table 51 that 

d 

— (sec x) = sec x tan x 
ax 

or, equivalently, 

— (3 sec x) = 3 sec x tan x 
dx 

showing that the antiderivative is 

j" 3 sec x tan x dx = 3 sec x + C, 

In colloquial terms, the process of finding the most general antiderivative of 
the function /(x) is called the 'integration of/(x)'. 

Table 8-1 gives a preliminary working list of important integrals which 
has been compiled from the tables of derivatives in Chapters 5 and 6. 

The two separate results shown against number 3 are usually contracted to 
dx 



J 



log | x | + C, 



with the tacit understanding that the arbitrary constant C differs according 
as x is positive or negative. With obvious modifications, this convention will 
be extended to include all integrals involving the logarithmic function. 
Specific examples involving this convention are to be found in Problems 
8-1-8-3. 

The following statement is equivalent to both Eqn (8-1) and Eqn (8-2), 
and it arises as a direct consequence of the definition of an antiderivative. 
We formulate it as a general theorem. 



SEC 8-1 INTEGRATION OF ELEMENTARY FUNCTIONS / 347 

Table 81 Basic table for integrals 



dx = + C (nt- -1); 

n + 1 



'• J* 

C a x 
2. a x dx = 

J lo g" 

C dx __ (log x + C 
' J x [log(-x) ■ 

/ 



+ C (a > 0); 



for x > 
+ C for x < 0; 



4. I z ax dx = - t" x + C (a T 4 0); 



r i 

5. cos ax dx = - sin ax + C (a + 0); 
J « 

6. I sin ax dx = cos ax + C (a ^ 0); 

a 



J- 

r dx 

J V(a 2 - x 2 ) ~ 
J a 2 + x 2 a 



1 - I ,t„2. "1 ~2> = arcsin - + C for | x | < | a 



arctan- + C (o^O); 

a 

. dx x 

1 \ (a 2 + x 2 ) a 



i . x 

arccosh - + C for x > a, 

io. i r j2 = B 

- arccosh /—?j + C for x < _ a; 



X f d - 

J \ (.V 2 - a 



dx 1 x 

: = - arctanh - + C for I x I < I a I ; 



. , i dx 1 x 

12. — ; = arccoth - + C for I x I > I a 

I V- — /7- fi n 'II 



x a- a a 



THEOREM 8-1 

^J7(x)d.Y=/(.v). 

In words, this general result merely asserts the obvious fact that the 
derivative of the antiderivative of a function /(x) is the function /(x) itself. 
Its most frequent application is probably to the verification of antiderivatives. 
For example, let us use the theorem to verify the antiderivative 



348 / SYSTEMATIC INTEGRATION CH 8 

g'dx 



Via 2 - g 2 ) 



"■" (?) 



= arcsin ^ + C, (A) 



where g = g(x) is some difTerentiable function of .v and \ g\ < a. 
By Theorem 81 we must have 

d C g'dx 

_ I s __ * (B) 



J V(« 2 - S 2 ) 



dx J V(« 2 - S 2 ) V(« 2 - g 2 ) 
Now, differentiating the right-hand side of (A) we find 



d 

d^ 



-ft 



arcsin - + C 



1 



V(l-(g/«) 2 ) « 
g' 



Via 2 - g*) 

which is identical with (B). Thus, (A) is verified. 

A final general result of great value is the fact that the derivative of a 
linear combination of functions is equal to the same linear combination of 
their derivatives (Theorem 5-4). Expressed in terms of antiderivatives this 
implies the following general theorem. 

theorem 8-2 

j (kif+ k 2 g)dx = kitfdx + krfgdx. 

It is, of course, this theorem that permits us to simplify many expressions 
to the point at which antiderivatives may be deduced from tables of standard 
integrals (antiderivatives) such as Table 8T. Hence we have 

J (5jc 2 - 2 cos x)dx = 5jx 2 d;t - 2J cos xdx 

5x 3 

= 2 sin x + C. 

3 

The separate arbitrary constants associated with each of the antiderivatives 
on the right-hand side have, of course, been combined into the single arbitrary 
constant C. 

The remaining sections of this chapter are concerned with outlining the 
details of the main techniques available for finding antiderivatives. 



8-2 I ntegration by substitution 

Possibly the most frequently used technique of integration is that in which 
the variable under the integral sign is changed in a manner which simplifies 
the task of finding the antiderivative. This process is known as integration by 
substitution or integration by change of variable. It is in this technique that 



SEC 8-2 INTEGRATION BY SUBSTITUTION / 349 

the full significance of the symbol dx in Eqn (8-1) is first realized. Indeed, by 
making a straightforward application of the chain rule for differentiation 
(Theorem 5-7) we shall arrive at a simple mechanical rule for effecting a 
variable change by using differentials. 

Because composite functions (functions of a function) of x often occur 
under the integral sign we shall consider a general antiderivative of the form 

/ = SKx) .f[g(x)]dx. 

In order to cover all likely cases we shall consider the effect on /of chang- 
ing the variable x to the variable w, where x and u are related by 

g(x) = h(u), (8-4) 

with/, g differentiable functions. 
Let us start by supposing that 

/ = Sk(x) .f\g{x)]dx = F(x) + C, (8-5) 

so that we know 

^=k(x).f[g(x)]. (8-6) 

Applying the chain rule to F(x) gives 

dF(x) _ dF dx 
du dx du 

which, by virtue of Eqn (8-6), may be written 

On the assumption that Eqn (8-4) may be solved for x in the form 

x = g-i[h(")] (8-7) 

we arrive at the result 

dF(x) dx 

-±! = kir 1 Wu)]}f[h(u)] -■ (8-8) 

Now by implicit differentiation (Corollary 5T9 (a)) of Eqn (8-4), it follows 
that provided g'(x) =£ 0, 

dx _ AXm) 
du g'(x) 

so that 

dF(x) k{g-i[h(u)]}f[h(u)W(u) 



du g'lrWu)]} 



(8-9) 



350 / SYSTEMATIC INTEGRATION CH 8 

However, Eqn (8-9) simply asserts that F(x) is an indefinite integral of 
k{g^[h{u)]}[Ku)]h\u) 

g'ig-HK")]} 

and thus taking the antiderivative yields 



/ 



gig- 1 [*(")]} 



du = F(x) + C*, (8-10) 



where C* is an arbitrary constant. 

Comparing Eqns (8-5) and (8-10), and on the understanding that two 
antiderivatives are equal if they only differ by a constant, we have thus proved 
that 

j*(.v)./r,(*)]d*= j g , {g _ 1[h(u)]} d, (8.11) 

This forms the result of the following theorem: 

theorem 8-3 (integration by substitution) If g, h are differentiable func- 
tions and g(x) = h{u), with g'(x) ^ and x = g~ l [h{u)], then 



\ 



k(x) .flg(x)]dx = 



k{g- l [Ku)\}f[h{u)]h\u) 
g'{g- l [h{u)]} 



Two special cases occur when (a) k(x) = 1 and g(x) -i x, so that g'(x) 
= 1, and (b) k(x) ^ 1 and h(u) = u, so that h'(u) = 1. These are stated as 
Corollaries 8-3 (a, b) below, which are the results most often to be found in 
textbooks. 



Corollary 8-3 (a) If x = h{u) is a differentiable function of w, then 

f/(x)d.v = Sf[h(u)]h'(u)du. 

In terms of the differential relationship d/i = h'(u)du this is also capable of 
expression in the form 

Sf(x)dx = Sf(h)dh. 



Corollary 8-3 (b) If g(x) = u is a differentiable function of x, with 
g'(x) ^ 0, then 



jf[g(x)]dx= jf(u)(~)du 



SEC 8-2 




or, as dx/dw = l/g'[g _1 ( M )]> 


f flg(x)]dx = 





INTEGRATION BY SUBSTITUTION / 351 



dw. 

AH of these results may be conveniently summarized in the form of a single 
simple mechanical rule for changing the variable in an antiderivative. 

Rule 1 (Integration by substitution) 

We suppose that in the antiderivative 

I=SKx).f[g(x)]dx 

it is required to change from the variable x to the variable u by means of the 
relationship g(x) = h(u), where g and h are differentiable functions, with 
g '(x) ^ 0. The result may be deduced from / above by : 

(a) replacing g(x) in f[g(x)] by h(u); 

(b) solving g(x) = h(u) for x in the form x = g* 1 [h(u)] and then replacing 
x in k(x) by this result ; 

(c) replacing dx by dw, where du is obtained from the differential rela- 
tionship g '(x)dx = h'(u)du; 

(d) replacing x in g'(x) by x = g _1 [^( M )L 

We now illustrate the application of this rule in a series of examples. 
Unfortunately, although the rule tells us how to change the variable, it offers 
us no information on the type of variable change that should be made. That 
is to say it does not tell us the functional form off and g. Only experience 
can help here. 

Example 8-1 Evaluate the antiderivative 

/ = f jcVO + x 2 )dx. 

Solution This antiderivative is of the most general type contained in 
Theorem 8-3. First we make the obvious identification k(x) = x z and then, 
to remove the square root function which is difficult to manipulate, we shall 
try setting 

1 + x 2 = I/ 2 . 

That is to say, in the hope that it will lead to a simpler expression, we make 
the further identifications 

g(x) = 1 + x 2 and h(u) = u 2 . 

The function /in Theorem 8-3 then becomes the square root function, with 
V(l + x 2 ) = u. Rather than solving for x, for the moment we shall use the 
result x 3 = x . x 2 = x(u 2 - 1), when we find xV(l + x 2 ) = xu(u 2 - 1). 
Now g'(x) = 2x and h'(u) = 2m, so that the differential relation g'{x)dx 



352 / SYSTEMATIC INTEGRATION CH 8 

= h'(u)du gives rise to xdx = udu. Hence, in differential form, 

*V(1 + x 2 )dx = u(u 2 -l)xdx = « 2 (w 2 - l)dw, 
and so by the rule derived from Theorem 8-3, 

/ = J" *V(1 + x 2 )dx = f k 2 (m 2 - l)dw. 

The antiderivative on the right-hand side is now straightforward and may be 
integrated on sight to give 

r5 jy3 



U° W 



'-T-? +c 



or, 



r (1 + x 2 ) 5/2 (1 + x 2 )*' 2 



Example 8-2 Evaluate the antiderivative 

/ = J V(l + x 2 )dx. 

Solution In this antiderivative k(x) = 1, but it is not immediately clear how 
best to change the variable. It is left to the reader to see why neither of the 
possible substitutions u 2 = 1 + x 2 or u = 1 + x 2 bring about any effective 
simplification. Instead, let us seek to remove the square root by making the 
substitution x = sinh u, so that the problem becomes analogous to Corollary 
8-3 (a). Then 1 + x 2 = 1 + sinh 2 u = cosh 2 u, so that \/(l + x 2 ) = cosh u. 
Next, as g(x) — x and h(u) = sinhu, g'(x) = 1, h'{u) = coshw and so 
dx = cosh udu. Applying the rule then gives 

V(l + x 2 )dx = cosh u . cosh udu = cosh 2 udu, 

whence 

/ = j 1 cosh 2 udu. 

Now use the identity cosh 2 u = J(cosh 2w + 1) to give 

/ = if (cosh 2w + l)du 

u 
— I sinh u + - + C. 

To return to the variable x it is necessary to use the results u = arcsinh x, 
cosh u = V(l + x 2 ) together with the identity sinh 2u = 2 sinh u . cosh u 
to obtain 

/ = |[xV(l + x 2 ) + arcsinh x] + C. 



SEC 8-2 INTEGRATION BY SUBSTITUTION / 353 

Example 8-3 Evaluate the antiderivative 
/ = J cos(l + 3x)d*. 

Solution This antiderivative has k(x) = 1, and by setting 1 + 3x = u so 
thatg(x) = 1 + 3x, h{u) = w it reduces to the situation of Corollary 8-3 (b). 
Applying the rule we find that cos (1 + 3x) — cos u and 3dx = dw, whence 

/■ = J" \ cos udu 
= I sin u + C, 

and thus 

/ = J sin (1 + 3jc) + C. 

Example 8-4 Evaluate the antiderivative 
/=» J2xV0 + x 2 )dx. 

Solution Setting u = 1 + x 2 it follows that dw = 2xdx, so that 

2x\/{\ + x 2 )dx = \/Mdw, 

whence 

/ = J" V"dw = f m 3/2 + C 

= |(1 + x 2 ) 3/2 + C. 

It is interesting to notice that when the situation found in Example 8-4 is 
expressed in terms of Theorem 8-3 by making the identification k(x) = g'(x) 
and then setting u = g(x) it gives rise to the general result 

/£'(*) ./te(*)]d*=f/(ii)d«. (8.12) 

This is not, of course, a new result since it is no more than the statement 
of Corollary 8-3 (a) with the roles of x and u interchanged. 

It is an immediate consequence of Eqn (8-3) that Theorem 8-3, together 
with its corollaries, also applies to definite integrals provided that the limits 
are also transformed by the same transformation law. The restatement of 
Theorem 8-3 in terms of definite integrals is as follows: 

theorem 8-4 (integration of definite integrals by substitution) If g, ft are 
differentiable functions and g(x) = h(u), with g'(x) =£ and x = g -1 [A(w)], 
u = h~ 1 [g(x)], then 

J« Jl'-H'Xa)] g {g-HKu)]} 

One specially simple case of this theorem merits recording in the form of a 
corollary. It is the result corresponding to Eqn (812) and is obtained by 



354 / SYSTEMATIC INTEGRATION CH 8 

making the identifications k(x) = g'(x), u = g(x). 
Corollary 8-4 If u = g(x) is a differentiable function, then 



fb fg(b) 

Ag(x)].g'(x)dx= f(u)du. 

Ja Ja(a) 



When expressed in the form of a mechanical rule, Theorem 8-4 is as 
straightforward to apply as was our previous rule. 

Rule 2 (Integrating definite integrals by substitution) 

We suppose that in the definite integral 

<•& 

(x)]dx 



fV) -mi* 

Ja 



it is required to change from the variable x to the variable u by means of the 
relationship g(x) = h(u), where g and h are differentiable functions, with 
g '(x) ^= 0. The result may be deduced from / above by : 

(a) transforming the differential expression k(x) .f[g(x)]dx as indicated 
in Rule 1 ; 

(b) solving g(x) = h(u) for u in the form u = /? _1 [^(x)] and replacing 
the upper limit b by h- l [g(b)] and the lower limit a by /* -1 [g(tf)]. 

Example 8-5 Evaluate the definite integral 

1= [ X 2 A /( 1 _ X 2) dx 

Jo 

Solution Let us make the substitution x = sin w, so that dx = cos udu, 
when 

x 2 V(l — x 2 )dx = sin 2 u . cos u . cos udu 
— sin 2 u . cos 2 udu. 

Then, as u = arcsin x, using the principal branch of the sine function, we 
find from Rule 2 that 



J-l /*arcsin 1 

x 2 -\/(l — x 2 )dx = sin 2 u . cos 2 udu 

Jarcsin 





= sin 2 u . cos 2 udu. 

Jo 

To evaluate this last definite integral we use a technique from Chapter 6 
which is often helpful. From Definition 6-6 we may write 



(g<« g-«'M\ 2 /gJM _1_ g-<tt\2 
2i M 2 / 



SEC 8-3 INTEGRATION BY PARTS / 355 

i q2iu 2-1- e~'" iW \ i q^ u + 2 4- Q-2tu\ 



\ -4 

^16 ' 



: )( ! 



and thus 

sin 2 u . cos 2 u = J(l — cos 4w). 

Using this result in the definite integral, which may then be evaluated on 
sight, we finally obtain 

I = i (1 — cos 4u)du 

Jo 



J [u - (i sin 4w)] 



= TO"", 
I) 



and so 



f 

Jo 



.Y'V(1 - -V 2 )d.V = ^7T. 



Example 8-6 Evaluate the definite integral 
/ = (2.v + 5) cosh (.v 2 + 5.y + ljd.v. 



Solution Inspection shows that this example is of the form of Corollary 8-4, 
with the function/^ cosh and g(x) = x 2 + 5x + 1. 

As g(0) = 1, g(l) = 7, by setting u = g{x) we at once obtain 



/ = cosh udu = (sinh 7 — sinh 1). 

8-3 Integration by parts 

This most valuable technique is based on Theorem 5-5, concerning the 
derivative of the product of two functions. That theorem asserts that if/, g 
are two differentiable functions of .y, then 

^lf(x)g(x)] = lf(x)g'(x)] + [f'(x)g(x)]. 
Taking the antiderivative of this result gives 

/(X)g(x) = i'f(x)g'(x)dx + J>(.Y)/'(.Y)d.Y 

which, on rearrangement, becomes 

J f(x) g '(.Y)d.Y = f(x) g(x) - / g(x)f'(x)dx. (8.13) 



356 / SYSTEMATIC INTEGRATION CH 8 

This is one form of the required result. Using the differential notation 
df = f\x)dx, dg = g'(x)dx enables this to be contracted to the equivalent 
and easily remembered alternative form 

Sfdg^fg-Sgdf. (8-14) 

These results are now formulated as our next theorem: 

theorem 8-5 (integration by parts) If/, g are differentiable functions of x, 
then 

$f(x)g'(x)dx =f(x)g(x) - j>(.Y)/'Cv)d.Y 

or, expressed in differential notation, 

Sf&g=fg-Sg&f- 

This useful theorem is the nearest possible approach to a general theorem 
for finding the antiderivative of the product of two functions. It depends on 
the fact that often the antiderivative j" g d/ is easier to determine than the 
antiderivative J/dg. Naturally, the technique of integration by substitution 
can also be employed when evaluating J g df. 

When definite integrals are involved it is not difficult to see that the result 
is still valid provided the limits are also applied to the product/*. The general 
result is as follows: 

theorem 8-6 (integration by parts: definite integral) If/, g are differenti- 
able functions of x in [a, b], then 

f/(.v)g'(.v)d.Y=/(.Y)g(.Y) '' - \ h g(x)f\x)dx 

J a if J a 

= [f(b)g(b)] - [f(a)g(a)j - jj(x)f'(x)dx. 

As before, we illustrate both of these theorems by means of a series of 
examples. These have been carefully chosen to demonstrate a variety of 
situations in which integration by parts is useful. 

Example 8-7 Evaluate the antiderivative 
/ = J x k log x dx for.Y >0, k ^ -1. 

Solution The problem here, as with all applications of the technique of 
integration by parts, is to decide upon the functions /and g. A little experi- 
mentation will soon convince the reader that / will only simplify if we set 
f{x) = log .y and g(x) = x^ l /(k + 1), for then g'(x) = x k and f'(x) = l/.v. 
Accordingly we write / in the form 



SEC 8-3 



INTEGRATION BY PARTS / 357 



/= loexd 



rk-H 



lk+ 1. 
Applying Theorem 8-5 gives 

x k+l ] g x 



/ = 



k + 



1 J k + 1 x 



r *+l 



log X 



r k+l 



+ c. 



k+l (k+l) 2 
Example 8-8 Evaluate the definite integral 



(•1/2 

Jo ' 



arcsin x dx. 



Solution This time we make the identifications/^) = arcsin x and g(x) = x 
and write 



-1/2 



f 

arcsin xd[x] = x arcsin x 
Jo 

We have 



1/2 p 

o Jo 



!/2 x dx 



V(i - * 2 ) 



(A) 



x arcsin x 



1/2 



= 77/12 - = 77/12 



but the definite integral on the right-hand side is still not recognizable. To 
simplify it let us now set u = 1 — x 2 so that x dx = — \ du; using Theorem 
8-4 we obtain 



2« 1/2 



3/4 = 1 - ^1. 
1 2 ' 



f 1 ' 2 xdx _ 1 I* 3 '- 1 du _ x 

Combining this result with (A) gives 

arcsin x dx = 77/12 ^ — 1. 

Jo 2 

Example 8-9 Evaluate the antiderivative 

/ = J t ax sin foe dx. 

Solution This time we choose to make the identification f{x) = sin bx, 
g(x) = (lla)e ax and to write / in the form 

/ = sin bx d I - e ax I • 

Integrating by parts we find 

t ax sin bx dx = - e a * sin fcx 
J a 



-?/ 



e TO cos bx dx. 



358 / SYSTEMATIC INTEGRATION CH 8 

Now let us use this same device on the second term above to obtain 



e ax sin bx dx = - c ax sinbx cos bx d | - e ax j 

J a a J \a / 



1 . , b b 2 r 

= - e ax sin bx e ax cos bx e ax sin bx dx + C. 

a a 2 a 2 J 

Combining terms gives 

, b 2 \ [ ■ , , z ax (a sin bx — b cos bx) 

1 + — e ax sin bx dx = + C, 



and so 



/ 



. , , e ax (a sin bx — b cos bx) 

e«* sin bx dx = — + C* 

a 2 + b z 



where C* is related to C by C* = a 2 C/(a 2 + b 2 ). In fact there is no necessity 
to distinguish between C and C*, since as C was an arbitrary constant of 
integration, C* is also an arbitrary constant. For this reason it is not 
customary to redefine arbitrary constants when, as above, they are simply 
multiplied by a constant factor. 

8-4 Reduction formulae 

It not infrequently happens that an antiderivative / involving a parameter m 
may be reduced by means of the technique of integration by parts to an 
expression in which the parameter has a value differing by an integer k from 
its original value. If we denote such an antiderivative by I m , then a typical 
situation is the one in which we arrive at an expression of the form 

I m = A(m) + 7 m _i, (8-15) 

where A(m) is some known function. 

Expressions of this form provide an algorithm for the computation of any 
antiderivative of the given type once one of them is known, for the I m are 
then defined recursively by this relation in terms of h, say. It is customary to 
refer to expressions of the general form of Eqn (8-15) as reduction formulae. 
The same idea is equally applicable, without essential modification, to definite 
integrals. 

Example 8-10 Determine the reduction formula for 

l m = J cos™ d0. 
Use the result to determine h. 

Solution We rewrite I m as follows and use integration by parts. 



SEC 8 . 4 REDUCTION FORMULAE / 359 

I m = J cos™- 1 d(sin 0) 

= cos™- 1 . sin - J sin 9 . (m - 1) cos™~ 2 0(-sin 0)d0 

= cos™" 1 ^ . sin + (m - 1) J cos™" 2 6 . sin 2 9 d0 

= cos™' 1 . sin + (m - 1) J cos™" 2 0(1 - cos 2 0)d0 

= cos™" 1 . sin e + (m - 1) / cos™- 2 d0 - (m - 1) / cos™ d0. 

Recalling the definition of I m we discover that this may be re-expressed in 
terms of I m and I m -2 as 

I m = cos™" 1 . sin + (m — l)/ m -2 — (m — X)I m , 

whence we arrive at the required reduction formula 



cos™ -1 . sin -/ w — * N 

7 m = H I I Im-2. 

m 



t^) 



Setting m = 7 gives 
cos 6 . sin 6 

/7- ~ 7 + -/ 5 

cos 6 . sin 6 /cos 4 . sin 4 ) 



. sin 6 /c 
"7 + 7(- 



5 5 



cos 6 . sin 6 „ . . . 24 /cos 2 . sin 2 r \ 
= + -cos 4 0. S1 n0 + -( - + -/,)• 

As h = J" cos d0 = sin + C this gives the result 



/ 



1 fs R 

cos 7 dd = - cos 6 . sin + —■ cos 4 . sin 6 + — cos 2 8 , sin i 



+ £*** + <:■ 



Example 8-11 Evaluate the definite integral 



Jo 



cos™ d0 



J 'in 
sin™ d0. 
o 



Solution We can make use of the reduction formula determined in the 
previous example. It follows from 



cos™ -1 . sin i. 

Im = 1- ( I Im-2 

in 



that the definite integral J m obeys the reduction formula 



360 / SYSTEMATIC INTEGRATION 



CH 8 



J m — 



cos" 1-1 d . sin 6 



** , lm- 1\ fm- 1\ 

+ ^m-2 = 7 m -2. 

o \ m J \ m ] 



We must now consider separately even and odd values of m. Firstly, if m is 
even, so that we may write m = 2n, then 



Jin 



In - 1 2n - 3 



> 



2« 2m - 2 

Secondly, if m is odtf, so that we may write m = 2n + 1, then 
In In — 2 



Jin+l 



2n + 1 2« - 1 



>■ 



J 'in /•}* 

I dd = %tt and /i = cos d0 = 1 , we 
o Jo 

obtain : 

1 . 3 . 5 . . . (2« - 1) , 



Jin 
J2n+1 = 



2.4.6. . .2n 
2.4.6. . .In 



3.5.7, 



(2« + 1) 



Finally let us prove that 

•frr 



sin m x dx. 



Jm = COS™ X dx = i 

Jo Jo 

To achieve this make the variable change x = \n — u in J m to obtain 

flir fO /"J" 

cos m x dx = — cos m (|7T — w)dw = cos m (%tt — tt)dw 

Jo Jiir Jo 



1" 

= I sin m w dw. 



1 



This last result is of some interest historically, as it provided the first 
infinite product representation for -n. One form of the argument used to derive 
this result proceeds as follows. 

It is readily seen from the expressions for J% n and /2n+i that 



in = 



' 2.4.6. . .In ' 
3.5. . .{In- 1). 



1 



Jin 



In + 1 J2n+1 

Now in the interval (0, ^tt) the following inequalities hold : 
sin 2 * 1 - 1 x > sin 2ra x > sin 2 » +1 x > 0, 
so that as 



(8-16) 



SEC 8-4 REDUCTION FORMULAE / 361 



J m = sin m x dx, 

Jo 

it follows at once that 

This is equivalent to 

^Z1>J^L>1, (8-17) 

J211+I J2n+1 

but as 

■/2»-i _ 2n + 1 
JWi 2« 

we must have 

lim ^ = 1. (8-18) 

B-.00 J2»+l 

By virtue of Eqns (8-17) and (8-18) it also follows that 
hm = 1. 

n-+oo J2n+l 

So, taking the limit of Eqn (8-16) as n — *■ 00, we arrive at the expression 

,. (2 2 4 4 6 2«-2 2« 2« \ , o inx 

^"Jr.ll'STs's- • •2^1-27^T'27TTJ' (8>19) 

This famous result, called an infinite product, was first obtained by the 16th- 
century mathematician John Wallis. If S n denotes the nth partial product 

2 2 4 4 2«-2 In 2n 



13 3 5 2/i - 1 2« - 1 2« + 1 

then the limit in Eqn (8T9) is to be interpreted to mean that | \n — S n | -»• 
as n — »- co. 

Reduction formulae may involve more than one parameter, as the final 
example illustrates. 

Example 8- 12 Show that 

I m%n = J" sin m x cos" x dx 
satisfies the reduction formula 

(m + ri)I m ,n = — sin" -1 x . cos n+1 x + (m — l)I m -2, n - 

Solution Write I m ,n in the form shown below and integrate by parts. 



362 / SYSTEMATIC INTEGRATION CH 8 

Im,n = J sin™" 1 x . cos™ x d(— cos x) 

= — sin"'- 1 x . cos" +1 x — J (— cos x)[(m — 1) sin™" 2 x . cos" +1 x 

—n sin™ x . cos" -1 x]dx 
= — sin™" 1 x . cos" +1 x + (m — l)/ OT - 2 , ra +2 — nl m ,„. 
Next reduce I m -2, n +2 to a simpler form by writing 

I m ~2,n+2 = J sin"*" 2 x . cos ra + 2 x dx = J sin™- 2 x . cos" x(l — sin 2 x)dx 
which shows that 

'i»-2,»+2 = 'm—2,n — *m,n- 

Using this to eliminate I m -2,n+2 from the previous result gives 

Im.n = — sin™" 1 X . cos» +1 x + (m — \)I m -2,n — (m — \)I m ,» — nlm.n 

or, 

(m + n)I m ,n = — sin" 4 - 1 x . cos» +1 x + (m — l)/ m -2,«. 

8-5 Integration of rational functions — partial fractions 

It will be recalled from Chapter 2 that a rational fraction is a quotient 
N{x)jD(x), in which N(x) and Z>(x) are polynomials. Antiderivatives of 
rational fractions are often required and in this section we indicate ways of 
expressing the fractions as the sum of simpler expressions, the antiderivatives 
of which are either known or may be found by standard methods. Our 
approach to the general problem of finding the antiderivative 

N(x) 



J D(x, 



. dx 
) 

will be to first consider some important special cases. 

Case (a) Suppose that N(x) is of degree and D(x) is a polynomial of 
degree 1 and write 

N(x) 1 



D(x) ex + d 

Then, making the substitution u = ex + d, we find 

C dx I r du 1 , , , 

— - - = - — = -log u +C 
J ex + d c J u c 



and so 



I 



— = - log \cx + d + C. 

ex + d c 



SEC 8-5 PARTIAL FRACTIONS / 363 



A similar argument establishes that 
dx -1 1 



I 



(ex + d) n c(n — 1) (ex + d) n - x 



+ C. 



Case (b) Suppose N(x) is of degree and D(x) is of degree 2 and write 

N(x) 1 

D(x) ax 2 + bx + c 

Then completing the square in the denominator D(x) gives 

bV 



ax 2 + bx + c = a 



7 b \ 2 (c b 2 \1 [( b\ 



+ a 



where a = (cja) — (b 2 /4a 2 ) may be positive, negative, or zero. Making the 
variable change u = x + (bj2a) then shows that 



_ r dx _ i r du 

J ax 2 + bx + c a J u 2 + 



This is a standard integral which may be identified from Table 8T once the 
sign of a has been determined. It will involve either the function arctan or the 
function arctanh. 

Case (c) Suppose N(x) is of degree 1 and D(x) is of degree 2 and write 

N(x) px + q 

D(x) ax 2 + bx + c 

Then we can write 

/= f P x + 1 dx = f (Pl 2a )( 2ax + b) + [g- (pbjla)} ^ 
J ax 2 + bx + c J ax 2 + bx + c 

from which we find 

dx 



. _ p_ [ 2ax + ft H , ( 2a ° — pb \ r 

2a J ax 2 + bx + c \ 2a /J 



ax 2 + bx + c 



The second antiderivative is the one discussed in (b) above, and by setting 
u = ax 2 + bx + c, the first antiderivative reduces to 

C lax + b fdw, ,,„,,„, 

„ , , , dx = — = log \u \+ C = log \ax 2 + bx + c + C. 
J ax 1 ' + i>x + c J u 

Combining this result with that of Case (b) then leads to the desired anti- 
derivative /. 

Case (d) Suppose iV(x) is of degree 1 and D(x) is a quadratic raised to the 



364 / SYSTEMATIC INTEGRATION CH 8 

power n > 1 and write 
N(x) px + q 



D(x) (ax 2 + bx + c) n 
Then, using the identity 



PX + q ^{^a) {2aX + b)+ { q -fa) 



enables us to write 



f PX + q d»-f^f 2UX + b dx 
J (ax 2 + bx + c) n \2aJ J (ax 2 + bx + c)» 

2a) J (a 



dx 

+ \q 



(ax 2 + bx + c) n 

Setting u = ax 2 + bx + c in the first antiderivative on the right-hand side 
then leads to 



f 2ax + b cu,f*f-^L)-L + c 

J (fl.v 2 + bx + c) n J u n \n - 1 / w"- 1 



-1 \ 1 

+ C 



\n-~\J (ax* + bx + c) n ~ l 
The second antiderivative on the right-hand side must be evaluated by means 
of a reduction formula. 

In the case n = 1 we have the obvious result 



/ 



2ax + b dx = log | ax 2 + bx + c | + C. 



ax 2 + bx + c 



Having considered a number of special cases we must now examine how 
we should proceed when D(x) is any polynomial with real coefficients, and 
the degree of the polynomial N(x) is less than that of D(x). The coefficient 
ao of the highest power of x in D(x) will be assumed to be unity, since if this 
is not the case it can always be made so by division of N(x) and D(x) by ao- 
Now we know from Corollary 41 (b) that D(x) may be factorized into real 
factors of the form 

D(x) = (x - af(x -b) 1 ... (x 2 +px + q) m , (8-22) 

where x = a, b, . . ., are real roots with multiplicities k, I, . . ., and (x 2 + 
px + q) m represents an w-fold repeated pair of complex conjugate roots. 

Then from elementary algebraic considerations it may be shown that when 
the degree of N(x) is less than that of D(x) we may always set 

N(x) Ay A 2 A* Bx 

D(x) ~ (x - a) (x - a) 2 (x - a)* (x - b) 



SEC 8-5 PARTIAL FRACTIONS / 365 



5 2 , B l Pix +■ Qi 



(x-b) 2 (x-b) 1 (x 2 +px + q) 

P*+Qz + . . . + PmX+Q™ . (8 . 23) 



(x 2 +px + q) 2 (x 2 +px + of 

That is to say, every rational fraction may be expressed as a sum of simple 
fractions of the types whose antiderivatives were obtained in Cases (a) to (d). 

The expression on the right-hand side of Eqn (8-23) is called a partial 
fraction expansion of the rational fraction N(x)jD(x) and the coefficients 
Ai, A%, . . ., P m , Qm are called undetermined coefficients. The undetermined 
coefficients may be found by cross-multiplication of this expression, followed 
by equating the coefficients of equal powers of x. Antiderivatives of rational 
fractions N(x)jD(x) may thus be found by a combination of the method of 
partial fractions and the results of Cases (a) to (d). 

If the degree of N (x) exceeds that of D(x) by n, then the situation may be 
reduced to the one just described by simply adding to the partial fraction 
expansion (8-23) the extra terms 

Ro + Rix + R 2 x 2 + • • • + R n x n . 

This result can also be achieved by first dividing N(x) by D(x). The circum- 
stances usually dictate which approach is the easier. 

Example 813 Evaluate 

r r (x 3 + 5x 2 + 9x + 5\ A 
/ = J I * 2 + 3*+l ) dx - 

Solution Here, as the degree of N(x) only exceeds that of D(x) by one, we 
shall start by dividing the integrand to get 

x 3 + 5x 2 + 9x + 5 , „ 2x + 3 

= x + 2 + 



x 2 + 3x + 1 x 2 + 3x + 1 

when 

r r 2x + 3 

/ = j (x + 2)dx + j_____ dx . 

The first antiderivative is trivial, whilst the second is of the form discussed in 
Case (d), so that 

v2 

/ = — + 2x + log I x 2 + 3x + 1 I + C. 

Example 8-14 Evaluate 
xdx 



I 



= Jc 



x + 2) 2 (x - 1) 



366 / SYSTEMATIC INTEGRATION CH 8 



Solution In this case we must adopt the partial fraction expansion 
x ABC 



(x + 2) 2 (x - 1) x + 2 (x + 2)2 x - 1 
Cross-multiplication gives 

x = A(x + 2)(x - 1) + B(x - 1) + C(x + 2)2 
or 

x = A(x z + x - 2) + B(x - 1) + C(x 2 + Ax + A). 
Equating coefficients of equal powers of x gives : 

Coefficient of x 2 : = A + C 

Coefficient of x: 1 = A + B + AC 

Coefficient of x°: = -2A - B + AC, 

showing that A = -1/9, B = 2/3, and C = 1/9. We may thus write 

f xdx - _ I f d;c ? J* dx 1 r dx 

J (x + 2)2(x - 1) ~ ~ 9 J x + 2 + 3 J (7+2)" 2 + 9 J x _ f 

These antiderivatives were all discussed in Case (a), so that using those results 
we obtain 

/ = - ^ log | x + 2 | - | ^-- + I log | . - 1 | + C. 

Example 8-15 Find the antiderivative 
« 4 - x 3 + 5x 2 + x + 3 



■■/" 



(X + 1)(X 2 - X + I) 2 



dx. 



Solution Here N(x) = x 4 - x 3 + 5x 2 + x + 3 and D(x) = (x + l)(x 2 - 
x + l) 2 , so that the degree of N(x) is 4 and the degree of D(x) is 5. Following 
on from our earlier reasoning we must set 

x 4 - x 3 + 5x 2 + x + 3 A Bx + C Dx + E 



(x + l)(x 2 - X + l) 2 X + 1 X 2 - X + 1 (x 2 - X + l) 2 

Cross-multiplication gives the identity 

x 4 - x 3 + 5x 2 + x + 3 = ^(x 2 - x + l) 2 

+ (Bx + C)(x + l)(x 2 - x + 1) + (Dx + E)(x + 1). 

Instead of expanding the right-hand side and then equating coefficients of 
equal powers of x as in the previous example, we shall use the fact that 
(x + 1) is a factor of D(x) to simplify this expression. Setting x = — 1 we 
find that 9 = 9A, or A = 1 and so 



SEC s-5 PARTIAL FRACTIONS / 367 

x 4 - x 3 + 5x 2 + x + 3 = (x 2 - x + l) 2 + (Bx + C)(x 3 + 1) 

+ (Dx + E)(x + 1), 

whence 

x 3 + 2x 2 + 3x + 2 = (Bx + C)(x 3 + 1) + (Dx + E)(x + 1). 

Having eliminated A we now proceed as before and equate coefficients of 
equal powers of x to find B, C, D, and E: 

Coefficient of x 4 : = B 

Coefficient of x 3 : 1 = C 

Coefficient of x 2 : 2 = D 

Coefficient ofx: 3 = B + E + D 

Coefficient of x°: 2 = C + E. 

Thus, B = 0, C = 1, D = 2, E = 1 and so 

f dx f dx f 2x + 1 „ 

^J xT l + J^r7Tl + J (x 2 -x + i) 2 dx = /l + /2 + /3 - 

f dx 

A-J^n-iogix + H + d 

i 

f dx 2 /2x-l\ „ 

72 = J (x - i) 2 + (V3/2) 2 = ~3 arCtan l-^-j + C * 



Now 



and 



To evaluate h write 

!dx 

TT) 2 



f__2x- 1 l_ f 2d; 

73 ~ J (x 2 -x+l) 2dX + J (x 2 -x 



-1 C 2dx 



J [(x- 



(x 2 - x + 1) J [(x - J) 2 + (V3/2) 2 ] 2 
Next, setting x - | = (\/3/2) tan 0, so that dx = (V3/2) sec 2 6 d0, gives 

f ^ = fV3sec 2 0d0 = 16V3 f 

J Kx - i) 2 + (V3/2) 2 ] 2 J (fsec 2 0) 2 9 J 

Using the identity cos 2 6 = |(1 + cos 20) this may be evaluated to give 

2 dx 8V3 



I 



[(x - I) 2 + (V3/2) 2 ] 2 9 
8V3 



[0 + 1 sin 20] + C 3 



/2x-l\ V3 lc-1 1 „ 
arctan(— ) + .-. ( ^- 7T ^) + C, 



368 / SYSTEMATIC INTEGRATION CH 8 

Hence we have shown that 

-1 



/3 = 



, 8a/3( (2x- 1\ V3 2x- 1 \ 
+ -~- arctan + ) + C 3 . 



(x 2 - x + 1) 

Adding h, h, and h to find / finally gives 

T , , , . 14V3 2x - 1 4x - 5 

/= log | x + 1 | + — arctan _ + ^— -^ + C. 

A factor (x 2 — x + l) 3 in the denominator would have led to J" cos 4 d dd 
and so, in general, we would obtain antiderivatives of the form J cos 2 " 6 dd. 



8-6 Other special techniques of integration 

A great variety of different methods exist for evaluating particular types of 
antiderivative, and in this final section we illustrate only a few specially 
useful ones with the help of some examples. Extensive tables of integrals are 
readily available and, where possible, should be used to minimise tedious 
manipulation. 

8-6 (a) Substitution t = tan x/2 

If we write t = tan x/2 it is easily proved by means of trigonometric identities 
that 

It 1 - fi 

sin x = and cos x = -• (8-24) 

1 + t 2 1 + t 2 v 

Using these results we can also establish the differential relation 

2dt 
dx=— • (8-25) 

Consequently, in principle, any rational fraction i?(sin x, cos x) that involves 
only the sine and cosine functions may be transformed by means of (8-24) 
into a rational fraction involving t. On account of this result and (8-25), 
it then follows that 



= .R(sin x, cos x)dx = \ R - 



It 1 - t 2 ' 



2d? 
1 + t 2 ' 



+ t 2 1 + t 2 _ 

Thus / has been transformed into an antiderivative of a rational function 
involving t. 

Example 816 Evaluate 

f cos x dx 
J 1 + sin x 



SEC 8-6 OTHER SPECIAL TECHNIQUES / 369 

Solution Transforming to the variable t as indicated above gives 
2(1 - 



J + 



■dt. 



m + 1) 

It is readily established that 

2(1 - i) 2 It 



t 2 



+ C 



1 + t\\ + t) l + t l + t 2 
showing that 

f 2d* f It J 

= log (1 + i) 2 - log (1 + t 2 ) + C. 
Thus 

whence from (8-24), 

/ = log (1 + sin x) + C. 



8-6 (b) ' Integration of R[x, \/(ax 2 + bx + c)] 

We define R[x, \/(ax 2 + bx + c)] to be a rational fraction involving x and 
■\/(ax 2 + bx + c). Special cases of this general type in which b = have 
been encountered in Examples 8-2 and 8-5 where it was shown that the sub- 
stitutions x = sin u and x = sinh u can be used to reduce the integrand to 
one involving only trigonometric or hyperbolic functions. If it is of trigo- 
nometric type then the technique of (a) above may be used to reduce the 
integrand further to a rational function. If the integrand is of hyperbolic 
type then the substitution 

t = tanh x/2, 

together with 

2 1 l+t 2 
sinh x = _ g and cosh x = (8-26) 

and the differential relation 
2dt 

dx = Y^T 2 ' (8-27) 

will again reduce the integrand to a rational function. 

If b #0, then completing the square under the square root sign gives 



370 / SYSTEMATIC INTEGRATION CH 8 



*~ +b * + *-J'[hh)' +G-&1 



The substitution u = x + (b/2a) will then reduce the problem to one of the 
two special cases just discussed, according to the signs of a and Kcla) - 
(b 2 /4a*)]. 

Example 817 Evaluate 
dx 



-/ 



V(2 -3x- 4x 2 ) 
Solution First we complete the square under the square root sign to obtain 



,_/ 



V{4[41/64 - (x + 3/8)2]} 
Then, setting u = x + f this becomes 

T _ 1 C d« . 8k 

y_? J V[(41/64)a-«2]- sarCSin V4i 

and thus 



/= * arCSin lV41-) + C - 

8-6 (c) Integration by means of differentiation under integral sign 

This approach utilizes the idea of differentiation under the integral sign with 
respect to a parameter. It relies on finding a known antiderivative involving a 
parameter a, say, with the property that the derivative of its integrand with 
respect to this parameter a is capable of being simply related to the integrand 
of the desired antiderivative. Specifically, the method uses the result that if 

F(x, a) = j"/(x, a)dx 

then, 

8F(x, a) f 8f(x, a) 



-; 



dx J 3a 



■dx. 



Example 8-18 Evaluate by means of differentiation under the integral sign 
the antiderivative 



J (* 2 



dx 



3/2 



+ a 2 ) : 

Solution We first note that the- integrand l/(x 2 + ^2)3/2 j s s j m piy related to 
the derivative 



SEC 8-6 



OTHER SPECIAL TECHNIQUES / 371 



8 
8a 



_(x 2 + a 2 ) 1/2 _ 
Accordingly, let us consider the familiar antiderivative 



J 



dx . , x _ 

— — — = arcsinh - + C. 

(x 2 + a 2 ) 1/2 a 



dx 



Then 

8 

8a J (x 2 + a 2 ) 1 ' 2 

and so 



/ 



8 
8a 



arcsinh — \- C 
a 



f 2a dx _ !x\ 1 

~* J ( X 2 + fl 2)3/ 2 - ~ ^j ((x/a) 2 + 1) 1/2 



or, 



/ 



dx 



(x 2 + « 2 ) 3/2 a 2 (x 2 + a 2 ) 1 ' 2 



+ C. 



The arbitrary constant C" has been added since we are deducing an anti- 
derivative and not just an indefinite integral. 

8-6 (d) Integration of trigonometric functions involving multiple angles 

Antiderivatives of products of trigonometric functions involving multiple 
angles are of considerable importance and the most frequently occurring 
ones are : 

h = J sin mx cos nx dx, (8-28) 

h = J" sin mx sin nx dx, (8-29) 

h = $ cos mx cos nx dx. (8-30) 

These are easily evaluated by appeal to the trigonometric identities : 

sin mx cos nx = |[sin (m + n)x + sin (m — n)x], (8-31) 

sin mx sin nx = |[cos (m — n)x — cos (m + n)x], (8-32) 

cos mx cos nx = M cos ( m + «)* + cos (m — «)x]. (8-33) 

Substitution of these identities into the above antiderivatives produces : 

1 
2 



cos (m — n)x cos (m + n)x" 



h = 



(m — ri) 



(m + ri) 



+ C for m 2 ^ n 2 



1 



(8-34) 



— - — cos 2wx + C for m = n, 
4m 



372 / SYSTEMATIC INTEGRATION 



CH 8 



h = 



"sin (m — ri)x sin (m + n)x 



(m — ri) 



(m + ri) _ 



+ C for m 2 ^ n 2 



■ (mx — sin mx cos mx) + C for m = n, 



"sin (m — «)x sin (m + «)x" 



/3 = 



(m — ri) 



m + n 



+ C for m 2 ^ m 2 



— (m + sin mx cos mi) + C for w = «. 
1 2m 



(8-35) 



(8-36) 



Example 8-19 Evaluate the following two antiderivatives : 

h = J sin 3x cos 5x d.v, h = $ sin 2 3x dx. 

Solution The antiderivatives follow immediately by substitution in (8-34) 
and (8-35): 

cos 2x cos 8* „ x sin 3x cos 3x 
/r = - j^ + C, /,-- + C. 



PROBLEMS 

Section 81 

8-1 Find the following antiderivatives: 

(a) 



(d) 



[4^; (e) ficos4^dx; (f) f 3* d*. 



8-2 Verify by means of differentiation that 
dx 



f 



= log | x + V(x 2 - a 2 ) I + C 



VO 2 - a 2 ) 

Compare this form of result with that shown against entry 10 of Table 8.1. 
8-3 Verify by means of differentiation that 



I 



dx 



a 2 — b 2 x z lab 



= ^t lo S 



a + bx 



a — bx 



+ C. 



Compare this more general result with those shown against entries 11 and 12 
of Table 8.1. 



8-4 Verify by means" of differentiation that 



[ 



Compare this form of result with that shown against entry 9 of Table 8-1. 



PROBLEMS / 373 



8-5 Use the result of Theorem 81 to verify the following general results: 

(a) f£dx = log|/| + C; 

(b) r /<»+d gdx =gf (n) - g'f <n ~ l) +g"f< n - 2) + ■ ■ ■ 



+ (-l)V n, /+ (-l) n+1 . Sg< n+1) fdx; 



"Hffl'-i 



= ^ + C; 



(d) U£l — £Z\dx = iog 



+ c. 



8-6 Apply the results of Problem 8-5 together with some slight manipulation to 
determine the following antiderivatives : 

, . f/2x sin x — x 2 cos x\ , 

(a) —„ Ax; 

' ' sin 2 x ' 



J( 

(b) j{ J?2*Z*ll ) dx; 



(c) Jx 2 e*/ 2 dx; 



, f/x sinhx — 3 cosh x\ 
J \ x cosh x J 



8-7 Evaluate the following antiderivatives: 

(a) J (x 2 + 3 sin x + l)dx; (b) J (4* + 2 cos 2x)dx; 

(c) J (4 sinh x + sin x)dx; (d) J (e M + 3)d.v. 

8-8 Use the following identities to evaluate the four antiderivatives listed below: 
sinh mx cosh mx = \ [sinh (m + n)x + sinh (m — n)x] 
sinh mx sinh nx = J[cosh (/n 4- n)x — cosh (m — n)x] 
cosh rax cosh nx = A [cosh (m + n)x + cosh (m — n)x] 
(a) J sinh 4x cosh 2x dx; (b) J sinh x sinh 3x dx; 

(c) J cosh 4x cosh 2x dx; (d) J cosh 2 2x dx. 

Section 8-2 

Use the indicated substitutions to evaluate the following antiderivatives. 

8-9 f d * .. , x = l/ii. 
J xv(x 2 — 4) 

810 J V(l - x 2 )dx, a: = sin u. 

« -.-. C tanh x dx , . 

811 TT, — Z rC> cosh x=l + u 2 . 

J 2 V(cosh x — 1) 



8-12 J cos *\/sin x dx, sin x = u. 
8-13 J x(3x 2 + l) 5 dx, 3x 2 + 1 = u. 



374 / SYSTEMATIC INTEGRATION CH 8 



8 - 14 /v§TT)' ^+0-.. 



Evaluate the following antiderivatives by means of a suitable trigonometric 
substitution. 



J V(l - * a ) 
f V(x* + 



816 I vv ~ ' J) dx. 



817 I -=^- -dx. 



Evaluate the following definite integrals. 
8-18 f (3x + 1) sinh (x 3 + x + 3)dx. 

8-19 f *V(1 + * a )d*- 



f V(* - 



8-20 f a/(* - 2)dx. 



Section 8-3 

Evaluate the following antiderivatives using the technique of integration by parts. 
8-22 J Q ax sin x dx. 
8-23 J xe ax dx. 



8-24 



f xdx 
J sin 2 x 



8-25 J sin x sinh x dx. \ 

8-26 J 7* cos x dx. 
8-27 J log 2 x dx. 
8-28 J x arcsin x dx. 

Secjion 8-4 
^_JB49 Given that /„ = J (1 — x 3 )" dx, where n is an integer, show that 
(3n + l)/„ = x(l - x 3 )« + 3« / fl -i. 
Hence prove that 

(1 _ *3)5 = 3 6/ 2 4 .7.13. 



/: 



PROBLEMS / 375 



8-30 The integral I m is denned by 



lm — 

Show that 



x 

— r — — -j dx for integral m >. 0. 

(x 2 + l) m+3 6 



_ m + 2 

Jm-l — 7 lm, 

m — I 
and by using the substitution x = tan prove that 



1 



sin 7 cos 5 6 d0 = — • 
o 



8-31 Show that for integral n > 1, 

x" sin x dx = n \ x n ~ x cos x dx 
Jo Jo 

and 

J' Jit /*i>7 

x" cos x dx = (iw)» — n x" -1 sin x dx 
o Jo 

Use the result to evaluate 

x 2 cos x dx. 

Jo 

8-32 The function I P , q is defined by 
ha = J xp (log x)« dx 
in which />,^ are positive integers. Show that 
(p + X)I p , q + q I p , v -i = x*> +1 (log x)«. 

8-33 If 

T M = J tan" d0, 
where n # 1 is a positive integer, show that 
tan"- 1 

Jra = z Tn-2. 

n — 1 
Use this result to evaluate 

tan 6 d0. 

8-34 The function I m ,n is defined by 

Im.n = J x m (a + bx)» dx, 
in which m,n are positive integers. Prove that 

b{m + n + l)I m ,n + ma 7 m -i,n = x m (u + bx) n+l . 
8-35 The function I m , n is defined by 

I m<n = J S in m cos" d0, 

in which m,n are positive integers. Show that l m ,n satisfies the reduction 
formula 



376 / SYSTEMATIC INTEGRATION CH 8 

(m + n)I m ,n — (n — l)/ m ,n-2 = sin m+1 x cos"- 1 x. 

Section 8-5 

Evaluate the following antiderivatives by means of partial fractions. 



8-36 f ^ 

J (x - l)(x + 2){x + 3) 

8 . 37 r *;-f + v 

J x 2 - 5x + 6 

838 f 3 f >4 . • 

J x 6 — 2x i + x 

8-41 f ^ 

J x 3 - 4x 2 + 5x - 2 

C x 2 + 2 
8 -43j ( , + 1) ; x _ 2) d- 

f- *4 + 4*3+lb; 2 +12*+8 

J (x 2 + 2x + 3) 2 (x + 1) 

Section 8-6 

Evaluate the following antiderivatives by means of the substitution t — tan x/2. 

8 . 45 r ** — 

J 3 + 5 cos x 

g-46 l" *! 

J sin x + cos x 

8 . 47 f . . dx „ 

J 8 — 4 sin x + 7 cos x 

„ ,„ f sin x 

848 -dx. 

J (1 — cos x) 3 

Evaluate the following antiderivatives by means of one or more suitable sub- 
stitutions. 



r dx 

J V(2 + 3x - 



849 , ,_ „ ^ 



PROBLEMS / 377 






3x- 6 

dx. 



V(x 2 - 4x + 5) 
dx 



xV(.l - * 2 ) 
8-52 J v(* 2 + 2x + 5) dx 



8-53 J (X-1W&- 



2) 



8-54 J V(* - x 2 ) dx. 

Use the technique of differentiation under the integral sign to evaluate the 
following antiderivatives. 

8-55 f xe ax dx. 

8-56 / (*» +* a y (Hint: start from / ^TI 2 " in Table 8 ' L) 

8-57 / (a 2 -^)3/2 - (Hint : Start from / v^^- * 2 ) '" TaWe 81 - } 

8-58 J xa x dx. (Hint : Start from J a x dx in Table 8- 1 .) 
Evaluate the following trigonometric antiderivatives. 
8-59 J cos x cos 2x dx. 

8-60 J sin ax sin (ox + e)dx, a, e non-zero constants. 
8-61 j cos x cos 2 3x dx. 

8-62 J sin x sin 2x sin 3x dx. 

Use the results of this chapter together with Definitions 7-4 and 7-5 of Chapter 7 
to classify the following improper integrals as convergent or divergent. Determine 
the value of all improper integrals that are convergent stating any conditions that 
must be imposed to ensure this. 



.65 r dx 

J (1 + x)Vx 

/* 00 

66 j cos x * 

/*oo 

•68 e^i 



8-66 I cos x dx. 



8-68 e^dx. 



Linear transformations 
and matrices 



9-1 Introductory ideas 

This chapter is concerned with the branch of mathematics known as linear 
algebra. One aspect of this subject has already been encountered, namely 
vectors, and it is now necessary to develop in a more general context various 
of the ideas that were first introduced there. Central to the entire subject is 
the fundamental idea that the algebraic operations of addition, subtraction, 
and multiplication can be made meaningful when applied to an array of 
numbers or functions considered as a single entity. 

An example will help here to indicate one of the many different ways in 
which such an array may arise, and at the same time to show something of 
the type of algebra it is reasonable to want to perform on an array. Three 
chemical plants numbered 1 to 3 each have separate sources of raw material 
from which each one produces the same four products numbered 1 to 4. Let 
plant number m produce product number n at a cost a mn units per ton, then 
the production costs of the complex of chemical plants is conveniently 
characterized by the following table of the twelve quantities a mn . 



Table 91 

Product 



Plant 



In writing this table or array of quantities a mn we have used the convention 
that the first of the two suffixes attached to the quantity a mn refers to the row 
number in which a mn appears, and the second to the column number. Thus 
the entry aii occurs in row 2, column 3, whilst the entry az% occurs in row 3, 
column 2. The important use of suffixes in this way is strictly analogous to a 
map reference in which the first entry is a latitude and the second a longitude. 
Thus the double suffix notation used here serves to identify the position in 
the array to which the associated quantity is assigned. 

On account of the use to which the suffixes have been put, we can now 
dispense with the extreme left-hand column and the top row of Table 9T, 
which only serve for identification purposes, and write instead 





1 


2 


3 


4 


1 


an 


an. 


ai3 


an 


2 


ai\ 


azi 


023 


024 


3 


031 


tf32 


033 


<?34 



SEC 9-1 



INTRODUCTORY IDEAS / 379 



a\\ fli2 ai3 ai4 

#21 022 023 ^24 
031 O32 O33 O34 



(9-1) 



with the understanding that the symbol A represents the array of quantities 
originally contained in Table 9-1. 

Returning now to the physical situation from which the array (9-1) was 
derived, let us suppose that at some time the quality of the raw materials 
changes, so that a revised Table 9T then applies in which entry a mn is replaced 
by the new entry b mn . Then, in terms of our concise notation, we can 
characterize this new situation by defining an array B as follows: 



B 



b\\ bi% biz bu 
bzi ba 623 624 
.631 £32 £33 634. 



(9-2) 



In terms of the information at our disposal, we know that the change in 
the cost of product n from chemical plant m is a mn — b mn , whilst the average 
cost of product n from plant m is \{a mn + b mn ). Hence, if C is the array of 
change in costs of products and D is the array of the average costs of 
products, in our new notation we may write: 



C = 



on — £11 012 — bl2 
021 — 621 022 — 622 

O31 — 631 032 — 632 



O13 — 6l3 O14 — bu 
023 — ^23 024 — &24 
033 — ^33 O34 — bsi 



(9-3) 



and 



D = 



|(Oll + 611) |(0l2 + 612) K«13 + 6l3) |(«14 + 614)' 

J(«21 + *2l) £("22 + 622) |("23 + £>23) K fl 24 + b 2i ) 

_i(a31 + &3l) K«32 + 632) K«33 + 633) K«34 + 634). 



(9-4) 



The form of these results is suggestive, for it would seem that by defining 
subtraction of two similar arrays to mean the array formed by the subtraction 
of corresponding elements, we may write 



C = A - B. 



(9-5) 



Similarly, if addition of two similar arrays is taken to mean the array 
formed by the addition of corresponding entries, and the multiplication of an 
array by a factor is taken to mean the array formed by the multiplication of 
each entry by that factor, we may write 



D = i(A + B). 



(9-6) 



380 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 

Hence, in a natural manner, we are starting to perform what appears to 
be conventional algebraic operations on an entire array of numbers, rather 
than on the individual entries in the arrays themselves. In mathematical 
terms an array of the form shown on the right-hand side of Eqn (9-1) is called 
a matrix of order (3 X 4). Here, analogous to the double suffix notation 
already introduced, the first number is taken to refer to the total number of 
rows in the matrix and the second number to refer to the total number of 
columns in the matrix. 

In terms of the simple physical situation used to introduce the notion of a 
matrix and its associated algebra we have so far given no indication of the 
interpretation to be placed upon multiplication. To elucidate the form taken 
by this operation when applied to matrices, we again return to our physical 
situation and consider the cost of buying ci, ct, cz, and a tons, respectively, 
of products 1, 2, 3, and 4 from each of the three chemical plants in turn. If 
the product costs are as shown in Table 9-1, and the costs of the orders are 
denoted by d\, d 2 , and d%, it is readily seen that 

d\ = CL\\C\ + ai2C2 + A13C3 + «14C4 

d 2 = azici + a 22 c 2 + 023C3 + a 2i a (9-7) 

da = aaici + az 2 c 2 + 03303 + 03404. 

In terms of the matrix A in Eqn (9-1), the right-hand side of the first 
equation in (9-7) is obtained by multiplying successive entries in the first row 
of A by a, c 2 , cz, and c\, respectively, and then adding the four products. 
The same process will generate the right-hand side of both the second and 
third equation in (9-7), provided that the entries in the second and third rows 
of matrix A are used in place of those in the first row. If the four numbers 
ci, C2, C3, and a are arranged in a column which is then regarded as a (4 x 1) 
matrix, the basic operation of matrix multiplication is seen to be the multi- 
plication of a row of the first matrix into the column of the second to yield a 
single number. Thus, in terms of the first row of A expressed as a (1 x 4) 
matrix, we have the definition 



flnci + ai2C2 + a\aca + ana = [an a\ 2 #13 014] 



where juxtaposition is used to imply multiplication of the row and column 
matrices on the right-hand side. 

Similarly, in terms of the second row of A expressed as a (1 x 4) matrix, 
our definition yields 



SEC 9-1 



INTRODUCTORY IDEAS / 381 



fl2lCl + A22C2 + C123C3 + 024^4 = [021 «22 023 O24] 



and a corresponding result is also true for the third row of A when expressed 
as a (1 X 4) matrix. This special form of product is called either the inner 
product or the scalar product of a row matrix and a column matrix. 

Collectively these results suggest that we should write Eqns (9-7) in the 
matrix form 



an 012 ai3 an 

021 022 023 «24 

031 032 033 034 



(9-8) 



with the understanding, as before, that multiplication is implied by juxta- 
position and means the inner product of rows of the first matrix with the 
column of the second matrix. To be consistent, equality of two matrices must 
then be taken to mean the equality of corresponding entries in two matrices 
of similar order. Using this convention our suffix notation works for us in 
the sense that the row number and the column number, taken in that order, 
which are involved in an inner product are the row and column numbers of 
the location into which that product is to be put. Thus in matrix equation 
(9-8), the number 0*2 is in row 2, column 1 of the left-hand column matrix, 
and it is the result of forming the inner product of row 2 of the first matrix 
on the right-hand side with column 1 of the second matrix. (The second 
matrix here only has one column.) 

If the column matrix with entries d\, 0*2, 03 is denoted by D, and the column 
matrix with entries a, C2, C3, and C4 is denoted by C, then Eqn (9-8) can be 
reduced to the deceptively simple equation 



D = AC. 



(9-9) 



It should be noticed that the resemblance to the algebra of real numbers ends 
here, because although multiplication is a commutative operation for real 
numbers, it is an easy task for the reader to verify that the matrix product 
CA is not even defined for the matrices involved here. Later we shall see that 
the non-commutative character of matrix multiplication is not the only 
difference between the field of real numbers and matrices. The result of matrix 
multiplication using numbers is illustrated in the following example: 



382 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



'1 2 1 01 


" 2" 




r 4 i 


113 


1 



= 


-2 


1 2 1 4J 


_-l_ 




^ 0_ 



We remark in passing that the name scalar product of a row matrix and 
a column matrix derives from a comparison with the scalar product of two 
vectors. Namely, if a = aii + a 2 j + a 3 k, p = fti + /?2J + fck are two 
vectors, then a . p = ai/?i + a2/?2 + «3/?3, which is just the result of forming 
the inner product of a row matrix with entries oci, 012, 1x3 and a column matrix 
with entries /Si, /S2, /S3. Because of this similarity it is customary to refer to 
matrices comprising only one row or one column as row vectors or column 
vectors, respectively. Thus a general (1 x n) row vector may be considered as 
a matrix representation of an ordinary form of vector having n components, 
and which belongs to an n-dimensional space. 

This simple idea proves to be very fruitful in more advanced accounts of 
linear algebra where it leads to the study of what are called w-dimensional 
vector spaces. These spaces have properties very similar to those discussed in 
Chapter 4 and, as in three dimensions, the scalar product is related to the 
geometrical operation of projection in the space. In an w-dimensional vector 
space a fundamental set of row or column vectors called a basis takes the 
place of the unit vectors i, j, and k and lead to the important idea of linear 
independence which will be examined later. 

Because of the shape of the array, a general (m x ri) matrix is called a 
rectangular matrix. The rule just devised for the product of a (3 x 4) matrix 
and a (4 x 1) column vector also applies to the product AB of two rectangular 
matrices A and B, provided only that the number of columns in A is equal to 
the number of rows of B. This last requirement follows directly from the 
concept of an inner product which is only denned when the number of entries 
in a row of A is equal to the number of entries in a column of B. Once again 
the suffix notation works for us, because the inner product of row/> of matrix 
A and column q of matrix B is the number c pa , which is found in row p and 
column q of the product matrix C = AB. Consider the following example 
which illustrates the application of this rule : 



12 10" 
113 
12 14 



["2 1" 




r 4 


71 


1 2 
2 


= 


-2 


7 


L-i 1. 







1 1 J 



Then, for example, the entry in row 3, column 2 of the product matrix is the 
number 11, which is the inner product of row 3 of the first matrix involved in 
the product and column 2 of the second matrix involved in the product. 



SEC 9-1 



INTRODUCTORY IDEAS / 383 



Notice that the rule for forming an inner product also determines the 
shape of the product matrix C = AB, for C must have as many rows as A 
and as many columns as B. (Think about this and check it.) In fact these 
arguments may be formulated into a useful short-hand rule for checking that 
two matrices are conformable for multiplication, and at the same time 
displaying the shape of the product matrix. 

Rule 1 (Multiplication conformability rule) 

If A is an (m x n) matrix and B is a (p x q) matrix, then the matrix product 
AB may be formed provided n = p. The resultant product matrix then has 
the form (m X q). Symbolically we write this 

(w X n)(p X q) = (m X q) only if n = p. 

Thus matrix products of the form (3 X 7)(7 X 2) are conformable for 
multiplication and yield a (3 x 2) matrix. Matrix products of the form 
(7 X 3)(5 X 4) are not defined and certainly do not yield a (7 X 4) matrix. 

This rule has various important implications, and at this stage in our 
argument we would draw attention to the fact that even when for two matrices 
A and B, both the matrix products AB and BA are defined, they are not 
usually equal. Indeed, the order of the two product matrices may be different, 
as the following example shows. If 



A = 



"1 


2" 







-1 


, B = 


.4 


1_ 





12 1" 
-1 1 



then 



AB = 



1 


4 r 


1 


-1 


3 


9 4 



and 



BA = 



' 5 
-1 



A different but most important way in which matrices can arise is in 
dealing with sets of simultaneous equations. Consider the following set of 
simultaneous equations : 

x + y + 2z = 4 
2x — y + 3z = 9 
3x — v — 2 = 2. 

These equations may be written in matrix form by introducing a column 
vector with entries x, y, z and then using the rule of matrix multiplication to 
write 



384 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 



'1 


1 


2" 


~x~ 




"4" 


2 


-1 


3 


y 


= 


9 


3 


-1 


-1 


_z_ 




_2_ 



With only a little practice, the reader will quickly learn to transcribe systems 
of equations into matrix form, for the patterns of numbers involved in the 
two numerical matrices are identical to the patterns of numbers in the 
equations themselves. 

For obvious reasons the (3 X 3) matrix is called the coefficient matrix of 
the simultaneous equations. As in this case there are three equations and 
three unknowns, the coefficient matrix is square in shape. In general the name 
square matrix will be given to any (n x n) matrix. If the coefficient matrix 
above is denoted by A, and the column vectors with entries x, y, z and 4, 9, 2 
are denoted, respectively, by X and K, we arrive at the matrix equation 

AX = K. 

There is a great temptation to attempt to solve this for X by dividing by 
A, but as it is meaningless to divide two arrays of numbers this approach 
must be abandoned. Later we will return to this matter and resolve the diffi- 
culty by introducing the concept of the inverse of a square matrix via the 
operation of multiplication. 

One final and important way in which matrices may arise is in connection 
with what are called linear transformations. The idea involved here is perhaps 
best understood if described in terms of coordinate transformations, and for 
this purpose we now confine attention to a special change of coordinates in 
a plane. 

Suppose a set of rectangular cartesian axes 0{x', /} in a plane is derived 
from a set of rectangular cartesian axes 0{x, y} by rotation about O through 
an angle 6. Then under this process a point P in the (x, j)-plane with co- 
ordinates (f , rj) appears as a point with coordinates (£', rj') in the (x', /)- 
plane, as shown in Fig. 9-1. 

Simple geometrical considerations show that 

f ' = £ cos 6 — rj sin 6 
rj' = | sin 6 + r\ cos 6. 

Now this result is true for any point P in the (x, j)-plane and its map in the 
(x', /)-plane, so that with complete generality we may display the effect of 
this coordinate transformation by writing 

x' = x cos 6 — y sin 6 

y' = x sin 6 + y cos 6. (9- 10) 

If the axes 0{x', y'} and 0{x, y} are thought of as belonging to two differ- 
ent but superimposed planes with a common origin, then Eqns (9- 10) may 



SEC 9-1 



INTRODUCTORY IDEAS / 385 




Fig. 9-1 Rotation in a plane. 



be regarded as describing the relationship between points in each plane 
when corresponding axes are inclined at an angle d. In this respect the 
transformation described by Eqns (9-10) can be regarded as a function or 
mapping, in the sense of Chapter 2, of the set of points comprising the 
(x, j)-plane into the set of points comprising the (x 1 , y)-plane. The mapping 
is obviously one to one, and both the domain and range of the mapping is 
the set of points comprising the plane itself. In matrix notation the relationship 
becomes 



x 
L/. 



cos d —sin i 
sin 6 cos I 



(9-11) 



Hence by pursuing the simple idea of the geometrical operation of the 
rotation of a plane about the origin we have arrived at the matrix 



Ra = 



cos a —sin I 
sin 6 cos i 



(9-12) 



The idea involved here is a much more general one than that involved in 
simultaneous equations, since R„ contains a complete description of how an 
entire plane transforms or maps, together with whatever specific curves of 
interest it may contain. In addition to this we have also produced an example 
of a matrix whose entries, or elements as we shall call them henceforth, are 
functions of a single real variable. 

Accordingly, it is reasonable to ask whether any meaning can be given to 
the entity dR 9 /d0, where R 9 is a matrix whose elements are functions of the 
real variable d. This is not an abstract matter, for in mechanics and many 
other subjects it is frequently convenient to work with axes that are fixed in 
a rotating body. Indeed, the same sort of idea was implicit in the example 



386 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 

first used to introduce matrices. In that case by regarding the quality of the 
raw material as a function of the time t, we arrive at a (3 x 4) matrix A(?) 
whose elements a mn (t) are functions of time and any attempt to examine 
rates of change involves considering the meaning of dA/dt. 

The term linear transformation in relation to the rotation transformation 
(9-11) comes about as follows. Consider the effect of a rotation on the two 
points (a, /?) and (y, d) which map into the points (a', /?') and (/, d'), 
respectively. Then from Eqns (9-10) we have 

a' = a cos — (j sin y' = y cos — d sin 

. „ and 

j3' = a sin + ft cos 6 d' = y sin + d cos 0, 

whence 

a' + y' = (a + y) cos - (0 + <5) sin 
0' + 0' = (a + y) sin + (/3 + i) cos 0. 
So, setting 



X = 



a y 

X* = 

J] k 

we have in fact shown that 

R 8 X + R 9 X* = R,(X + X*), (9-13) 

which asserts that multiplication by R 9 is distributive with respect to addition. 
It is the general property described by Eqn (9-13) that is used to characterize a 
linear transformation, and it is on account of this that R fl X is called a linear 
transformation of the vector X. In fact matrix multiplication is always 
distributive with respect to addition, as we shall see later. 

Thus far in our introductory presentation of matrices only intuitive argu- 
ments have been used. This approach has been adopted deliberately in an 
attempt to emphasize that matrices arise naturally, and that an obvious 
algebra suggests itself for their manipulation. To proceed further it now 
becomes necessary to formalize these ideas in exact mathematical terms, and 
then to develop them in systematic form to the point at which they can be 
used as a useful tool. 

9-2 Matrix algebra 

In this section we return to the fundamental ideas connected with matrices 
and their algebra which were outlined on an intuitive basis in Section 9T. 
This time, however, our discussion will be more formal and, relying on our 
introductory account to provide motivation, we shall proceed quickly 
through the basic definitions and theorems, which will be illustrated by 
example. The problem of the solution of systems of linear equations and a 
discussion of linear transformations and some of their applications will be 



SEC 9-2 



MATRIX ALGEBRA / 387 



presented in subsequent sections. 

definition 9-1 (matrix and its order) A matrix is a rectangular array of 
elements ay involving m rows and n columns. The first suffix / in element ay is 
called the row index of the element and the second suffix y is called the column 
index of the element. These indices specify the row number and column 
number in which the element is located, with row 1 occurring at the top of 
the array and column 1 at the extreme left. A matrix with m rows and n 
columns is said to be of order mby n and this is written (m x «). The order 
describes the shape of the matrix. 

Special names are given to certain types of matrix and we now describe 
and give examples of some of the more frequently used terms. 

(a) A row matrix or row vector is any matrix of order (1 X n). The 
following is an example of a row vector of order (1 X 4) : 

[3 7 2]. 

(b) A column matrix or column vector is any matrix of order (n x 1). The 
following is an example of a column vector of order (3 x 1): 

11 



(c) A square matrix is any matrix of order (« X n). The following is an 
example of a square matrix of order (3 x 3) : 



"1 2 


4" 


3 


2 


5 1 


3_ 



Three particular cases of square matrices that are worthy of note are the 
diagonal matrix, the symmetric matrix and the skew-symmetric matrix. Of 
these, the diagonal matrix has non-zero elements only on what is called the 
principal diagonal, which runs from the top left of the matrix to the bottom 
right. The principal diagonal is also often referred to as the leading diagonal. 
The following is an example of a diagonal matrix of order (4x4): 



3 








0" 




















2 














5. 



The diagonal matrix in which every element of the principal diagonal is a 



388 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



unity is called either the unit matrix or the identity matrix, and it is usually 
denoted by I. The unit matrix of order (3 x 3) thus has the form 



1 = 



"1 





-o" 





1 





.0 





1_ 



A symmetric matrix is one in which the elements obey the rule at) = qju 
so that the pattern of numbers has a reflection symmetry about the principal 
diagonal. A typical symmetric matrix of order (3 X 3) is: 



2 
-2 



A skew-symmetric matrix is one in which the elements obey the rule 
ay = —a n , so that the principal diagonal must contain zeros, whilst the 
pattern of numbers has a reflection symmetry about the principal diagonal 
but with a reversal of sign. A typical skew-symmetric matrix of order (3 x 3) 
is: 



1 


5" 


-1 


-3 


-5 3 


0_ 



(d) A null matrix is the name given to a matrix of any order which con- 
tains only zero elements. It is usually denoted by the symbol 0. The null 
matrix of order (2 x 3) has the form 





ro o oi 


= 






[o OJ 



definition 9-2 (equality of matrices) Two matrices A and B with general 
elements ay and by, respectively, are equal only when they are both of the 
same order and ay = 6 y for all possible pairs of indices (i,j). 

Example 91 Is it possible for the following pair of matrices to be equal 
and, if so, for what value of a does equality occur : 



5 


fl3" 


and 


"5 


-27 


a a 


1 




9 


1 



Solution The matrices are both of the same order and hence they will be 
equal when their corresponding elements are equal. As corresponding ele- 



SEC 9-2 



MATRIX ALGEBRA / 389 



ments on the principal diagonal are indeed equal, we need only confine atten- 
tion to the off-diagonal elements. Thus the matrices will be equal if there is a 
common solution to the two equations a 2 = 9 and a 3 = —27. Obviously, 
equality will occur if a = —3. 

definition 9-3 (addition of matrices) Two matrices A and B with general 
elements ay and fey, respectively, will be said to be conformable for addition 
only if they are both of the same order. Their sum C = A + B is the matrix 
C with elements cy = ay + fey. 

As addition of real numbers is commutative we have ay + fey = fey + ay. 
This shows that addition of conformable matrices must also be commutative, 
whence 



A + B = B + A. 



(9-14) 



Now addition of real numbers is also associative so that (ay + fey) + 
cy = ay + (fey + cy). Hence if ay, fey, and cy are general elements of 
matrices A, B, and C which are conformable for addition, then this also 
implies that addition of matrices is associative, whence 



(A + B) + C = A + (B + C). 

Results (9T4) and (9-15) comprise our first theorem. 



(9-15) 



theorem 91 (matrix addition is both commutative and associative) If 
A, B, and C are matrices which are conformable for addition, then 

(a) A + B = B + A (Matrix Addition is Commutative) ; 

(b) (A + B) + C = A + (B + C) (Matrix Addition is Associative). 

Example 9-2 Determine the constants a, b, c, and d in order that the 
following matrix equation should be valid : 



'0 a 3" 
fe 2 2 



+ 



"4 3 5" 
7 3 5 



Solution Adding the two matrices on the left-hand side we arrive at the 
matrix equation 



c (a + 1) 5 
.(fe +1) 3 (d + 2). 



3 5" 
3 5 



Equating corresponding elements shows that a = 2, fe = 6, c = 4, and 
d=3. 



definition 9-4 (multiplication by scalar) If k is a scalar and the matrix 



390 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



A has elements ay, then the matrix B = kA is the same order as A and has 
elements kay. 

Example 9-3 Determine 2A + 5B, given that: 



A = 


"1 2" 
.3 4_ 


and 


B 






1 3" 
4 2_ 




ution 


2A + 


5B = 2 


'1 2" 
.3 4_ 


+ 5 


"-1 

4 


3" 

2_ 



or, 



whence 



2A + 5B = 



2A + 5B = 



"2 4" 


+ 


"-5 15" 


J> 8. 




. 20 10_ 



-3 19" 
26 18 



definition 9-5 (difference of two matrices) If the matrices A and B are 
both of the same order, then their difference A — B is defined by the relation 

A-B = A + (-l)B. 
Example 9-4 Determine A — B, given that: 



A = 



Solution 



"1 


3" 


4 


-2 


_1 


6_ 



and 



B = 



'4 2" 
3 1 
-2 



B 



"1 


3" 




"4 


T 


4 


-2 


+ (-1) 


3 


1 


.1 


6_ 







-2_ 



and so 



B = 



"1 


3" 




4 


-2 


+ 


1 


6_ 





4 


-2" 




3 


-1 


= 





2_ 





-3 1" 
1 -3 
1 8 



definition 9-6 (matrix multiplication) The two matrices A and B with 



SEC 9-2 



MATRIX ALGEBRA / 391 



general elements ay and 6y are said to be conformable for matrix multiplica- 
tion provided that the number of columns in A equals the number of rows in 
B. If A is of order (m x «) and B is of order (n X r), then the matrix product 
AB is the matrix C of order (m x r) with elements cy, where 

ct] = anbij + aabzj + • • • + Qtnbnj- 

The number cy is called the inner product of the ith row of A with the yth 
column of B. 



Example 9-5 Determine A + BC, given that : 



A = 



"1 4" 


B = 


"1 4 2" 


|_2 3J 




[.2 1 lj 



and 



C = 



'3 4 
1 
2 



Solution Matrix B is of order (2 x 3) and matrix C is of order (3 x 2), 
showing that BC are conformable for multiplication. We have 



BC = 





"3 4" 






"1 4 2" 






"7 8" 




1 


= 




.2 1 1. 






7 10 




.0 2_ 





and so 

A + BC = 



'1 4" 




"7 81 


"8 12 




+ 


= 




2 3. 




7 IOJ 


9 13 



On account of the fact that matrix multiplication is not normally com- 
mutative, it is important to use a terminology that distinguishes between 
matrix multipliers that appear on the left or the right in a matrix product. 
This is achieved by adopting the convention that when matrix B is multiplied 
by matrix A from the left to form the product AB, we shall say that B ispre- 
multiplied by A. Conversely, when the matrix B is multiplied by A from the 
right to form the product BA, we shall say that B is post-multiplied by A. 

The most important results concerning matrix multiplication are con- 
tained in the following theorem, which asserts that matrix multiplication is 
distributive with respect to addition and that it is also associative. 

theorem 9-2 (matrix multiplication is distributive and associative) If 
matrices A, B, and C are conformable for multiplication, then : 

(a) matrix multiplication is distributive with respect to addition, so that 

A(B + C) = AB + AC; 



392 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 

(b) matrix multiplication is associative, so that 

A(BC) = (AB)C. 

Proof To establish result (a) let B and C be of order (m x «), and denote 
their general elements by b t j and cy, respectively, so that the general element 
of B + C is b tl + c tl . Then if A is of order (r x m) with general element ay, 
and d v , is the general element of D = A(B + C) which is of order (r x «), 
we have from Definition 9-6 that 

dij = an(bij + cy) + a^by + c 2 /) + • • • + a tm (b m j + c m] ). 

Performing the indicated multiplications and re-grouping we have 

dy = (aabij + atzbz] + • • • + aimbm)) + (aacij + awcy + 

" • • + atmCmj). 

However, from Definition 9-6 this is seen to be equivalent to 
D = AB + AC, 

which was to be proved. 

Result (b) may be established in similar fashion, and to achieve this we 
assume A, B, and C to be respectively of order (p X q), (q X m), and (m X n) 
with general elements ay, by, and cy. 

From Definition 9-6 we know that the general element occurring in row i, 
column j of the product BC has the form 

bilCij + bi2C2} + - • ■ + bimCmj, 

so that the general element dy occurring in row / column j of the product 
D = A(BC) which is of order (p X ri) must have the form 

dy = aa(biiaj + bncij +•••-)- b\ m c m j) 
+ awdbzicij + biicij + • • • + b?, m c m j) 
+ 

+ atqibqlCl] + b Q 2C2j + • • • + bqmCmj). 

Re-grouping of the terms then gives 

dy = (aabn + atzHx + • • • + atqb q i)cij 
+ (anbi2 + a«2&22 4- • • • + a tg b g 2)c2} 

+ 

+ (anbim + aizbzm + • ■ ■ + ai g b gm )c m j. 

Appealing once more to Definition 9-6 we find that this is equivalent to 

D = (AB)C, 

which was to be proved. 



SEC 9-2 



MATRIX ALGEBRA / 393 



Example 9-6 If 

A = [1 2], B = 

verify that 

(a) A(B + C) = AB + AC, 

(b) A(BC) = (AB)C. 

Solution 
(a) We have 

'3 4" 

2 3 



" 1 3" 


, c = 


"2 r 


L-1 2J 




L3 lj 



B + C = 

so that 

A(B + C) = [1 2] 



'3 4" 
2 3 



= [7 10]; 



whereas 

AB = [-1 7] and AC = [8 3], 
so that 

AB + AC = [7 10]. 
(b) We have 



BC = 

so that 

A(BC) = [1 2] 
whereas 

AB = [1 2] 
whence 

(AB)C=[-1 7] 



" 1 3" 


"2 r 




"11 4" 


.-1 2_ 


.3 1. 




. 4 1. 



11 4" 
4 1 



= [19 6]; 



1 3" 
-1 2 



= [-1 7], 



"2 r 

3 1 



= [19 6]. 



394 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



An important matrix operation involves the interchange of rows and 
columns of a matrix, thereby changing a matrix of order (m x ri) into one of 
order (n x rri). Thus a row vector is changed into a column vector and a 
matrix of order (3 X 2) is changed into a matrix of order (2 x 3). This 
operation is called the operation of transposition and is denoted by the 
addition of a prime to the matrix in question. 

definition 9-7 (transposition operation) If A is a matrix of order (m X ri), 
then its transpose A' is the matrix of order (n x rri) which is derived from A 
by the interchange of rows and columns. Symbolically, if ay is the element in 
the /th row andy'th column of A, then aji is the element in the corresponding 
position in A'. 

Example 9-7 Find A' and (A')', given that: 
"1 4 7 31 
2-14-1 



A = 



Solution Writing the first row in place of the first column and the second 
row in place of the second column, as is required by Definition 9-7, we find 
that 





"1 


2" 




4 


-1 


A' = 








7 


4 




.3 


-1. 



The same argument shows that 
"1 4 7 3" 
2 -1 4 -1 



(A')' = 



It is obvious from the definition of the transpose operation that (A')' = A, 
as was indeed illustrated in the last example. It is also obvious from 
Definitions 9-3 and 9-5 that if A and B are conformable for addition, then 

(A ± B)' = A ± B'. (9-16) 

Now if A is of order (m x ri) and B is of order (« x r), and the general 
matrix elements are ay and by, respectively, the element cy in the rth row and 
y'th column of the matrix product C = AB is 

cy = aabij + aabzj + • • • + «*»£>»;•• 

By definition, this is the element that will appear in the /'th row and fth 
column of (AB)'. 

Applying the transpose operation separately to A and B we find that A' 



SEC 9-2 



MATRIX ALGEBRA / 395 



is of order (n X m) and B' is of order (r x «), so that only the matrix product 
B'A' is conformable. 

Now the elements of they'th row of B' are the elements of they'th column 
of B, and the elements of the /th column of A' are the elements of the /th 
row of A, so that the element dji in the/'th row and /th column of the product 
D = B'A' must be 

dji = bijOa + Z>2/#«2 + • ' ' + bnjdin 

or, equivalently, 

da = anbij + a^b^ + ■ • • + ai n b n j. 

However, equating elements in the y'th row and /th column of (AB)' and 
B'A' we find that cy = dji, and so 

(AB)' = B'A'. (9-17) 

We summarize these results into a final theorem. 

theorem 9-3 (properties of transposition operation) If A and B are con- 
formable for addition or multiplication, as required, then : 

(a) (A')' = A (Transposition is Reflexive); 

(b) (A + B)' = A' + B'; 

(c) (A - B)' = A' - B'; 

(d) (AB)' = B'A'. 

Example 9-8 Verify that (AB)' = B'A', given that: 



A = 



"1 3" 

Solution We have 

AB = 
so that 

(AB)' = 
However, 

B'A' = 



and 



B = 



'2 -1' 
3 1 



"1 3" 


"2 


-n 




"11 2" 


- 2 4. 


.3 


i_ 




.16 2_ 



11 16" 

2 2 



" 2 3" 


"1 2" 




"11 


16" 


-- 1 1. 


.3 4_ 




. 2 


2. 



which is equal to (AB)'. 



396 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



9-3 Determinants 

The notion of a determinant, when first introduced in Chapter 4, was that of 
a single number associated with a square array of numbers. In its subsequent 
application in that chapter it was used in a subsidiary role to simplify the 
manipulation of the vector product, and in that capacity it gave rise to a 
vector. The determinant made yet another appearance in Chapter 5 when, in 
connection with the change of variable in partial differentiation, it contained 
functions as elements, and was called a Jacobian. In this role it is often called 
a functional determinant, and it gives rise to a. function that is closely related 
to the one-to-one nature of the change of variables involved. 

These are but two of the situations in which determinants occur in 
different branches of mathematics, and it is the object of this section to 
examine some of the most important algebraic properties of determinants. 
Our results will only be proved for determinants of order 3 but they are, in 
fact, all true for determinants of any order. 

We begin by rewriting Definition 4T6 using the matrix element notation 
as follows : 

definition 9-8 (third order determinant) Let A be the square matrix of 
order (3 x 3) 

an 0i2 ai3 

A = 021 «22 023 
_031 ^32 «33. 

Then the expression 

an «i2 «i3 

A 1 = «21 022 #23 
031 032 O33 

is called the third order determinant associated with the square matrix A, 
and it is defined to be the number 



A I = on 



021 


023 


+ O13 


031 


033 





#22 023 O21 023 O21 fl22 

— 012 + O13 

032 033 «31 O33 «31 «32 

where for any numbers a, b, c, and d, 

a b 

= ad — be. 



The notation det A is also frequently used in place of | A | to signify the 
determinant of A. 



SEC 9-3 



DETERMINANTS / 397 



This definition has a number of consequences of considerable value in 
simplifying the manipulation of determinants. Let us confine attention to the 
third order determinant which is typical of all orders of determinant, and 
expand the last line of Definition 9-8. We have 



an 


ai2 


«13 


021 


022 


023 


031 


032 


«33 



= «11«22033 — 011^23032 + #12023031 

— 012021033 + 013021032 — 013022031, (9'18) 



showing that one, and only one, element of each row and each column of the 
determinant appears in each of the products on the right-hand side defining 
| A |. Hence, if any row or column of a determinant is multiplied by a factor 
A, then the value of the determinant is multiplied by A, since a factor A will 
appear in each product on the right-hand side of Eqn (9-18). Conversely, 
if any row or column of a determinant is divided by a factor A, then the value 
of the determinant is divided by A. It is also obvious from Eqn (9-18) that 
| A | = if all the elements of a row or column of | A | are zero, or if all the 
corresponding elements of two rows or columns of | A | are equal. 
Suppose, for example, that A = 3 and i 7 \ i 



X. 



Al = 



1 


2 


3 


2. 


h 


h 


4, 


1: 


. 2 i! 



4- * 



Then it is easily shown that | A | = — 5, so that 3 | A | = —15. Now this 
result could have been obtained equally well by using the above argument 
and multiplying any row or any column of | A | by 3. If the first row of | A | is 
multiplied by 3 we have ^ / N 



31 Al = 



3 6 9 

2 1 1 

4 1 2 



"L 



J 



= -15 



■ (\ 



M- 



or, alternatively, if the third column is multiplied by 3 we have 



3 I Al = 



1 2 9 

2 1 3 
4 1 6 



= -15. 



It is readily verified from Eqn (91 8) that interchanging any two rows or 
columns of | A | changes its sign. Thus we have 



398 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



1 4 


3 




2 1 


4 


= _ 


9 4 


-6 





1 



3 4 



3 








-2 





4 



2 4 1 

9-6 4 

in which the determinant on the left has been obtained from the one on the 
right by interchanging the second and third columns. 

A particularly simple case arises when | A | is the determinant associated 
with a diagonal matrix A, for then all off-diagonal elements are automatically 
zero. This implies that Eqn (9-18) reduces to | A | = fliio 2 2#33, which is just 
the product of the elements of the principal diagonal. Thus if 



A| = 



then I A I = (3)(-2)(4) = -24. 

Another useful result is that the value of a determinant is unchanged when 
elements of a row (or column) have added to them some multiple of the 
corresponding elements of some other row (or column). We prove this result 
by direct expansion in the following typical case. Consider the determinant 
I D I obtained from | A | by adding to the elements of column 3 of | A |, A 
times the corresponding elements in column 2 of | A | to obtain : 

#11 #12 #13 + Xa\z 

D I = 021 #22 #23 + Afl22 
031 #32 «33 + Aa32 

Then at once Definition 9-8 asserts that 



D I = an 



#22 #23 
032 033 


+ Oil 


022 A022 
O32 Afl32 


— #12 


021 023 
#31 #33 






— O12 


021 Xa 
031 Xa 


22 
32 


+ #13 


021 #2 
031 #3 


2 
2 


+ Afll2 


#21 
#31 


#22 
#32 



Now the second term on the right-hand side is zero, whilst the fourth and 
last terms cancel leaving only three remaining terms. These are seen to com- 
prise the definition of | A |, so that we have proved that | D | = | A | or, in 
symbols, that 

On #12 #13 + Afli2 

#21 #22 #23 + A022 

#31 #32 #33 + Afl32 



#11 


012 


#13 


#21 


022 


#23 


#31 


#32 


#33 



SEC 9-3 



DETERMINANTS / 399 



A similar result would have been obtained had different columns been used 
or, indeed, had rows been used instead of columns. 

An obvious implication of this result is that if a row (or column) of a 
determinant is expressible as the sum of multiples of other rows (or columns) 
of the determinant, then the value of the determinant must be zero. This is so 
because by subtraction of this sum of multiples of other rows (or columns) 
from the row (or column) in question, it is possible to produce a row (or 
column) containing only zero elements. 

Let us illustrate how a determinant may be simplified by means of this 
result. Consider the determinant 

7 18 8 

A I = 1 5 7 

3 9 4 

Subtracting twice the third row from the first row we find 

1 

A| = 1 5 7 

3 9 4 

whence | A | = —43. 

Let us summarize our findings in the form of a theorem. 

theorem 9-4 (properties of determinants) 

(a) A determinant in which all the elements of a row or column are zero, 
itself has the value zero ; 

(b) A determinant in which all corresponding elements in two rows (or 
columns) are equal has the value zero ; 

(c) If the elements of a row (or column) of a determinant are multiplied 
by a factor X, then the value of the determinant is multiplied by X; 

(d) The value of a determinant associated with a diagonal matrix is 
equal to the product of the elements on the principal diagonal ; 

(e) The value of a determinant is unaltered by adding to the elements of 
any row (or column), a constant multiple of the corresponding elements 
of any other row (or column) ; 

(f) If a row (or column) of a determinant is expressible as the sum of 
multiples of other rows (or columns) of the determinant, then its value is 
zero. 



Higher order determinants can be defined with exactly similar properties 
to those enumerated in the theorem above. Thus the determinant | A | of 
order n associated with the square matrix A of order (« x n) has «! terms in 



400 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



its expansion, each of which contains one, and only one, element from each 
row and column of A. 

definition 9-9 (fourth order determinant) If A is the square matrix of 
order (4 X 4) 



A = 



All 012 tfl3 «14 

021 022 «23 «24 

«31 032 «33 #34 

JZil 042 «43 G44_ 

then the expression 

on ai2 ai3 an 

«21 «22 023 024 
031 032 033 034 
041 O42 043 O44 



A| = 



is called the fourth order determinant associated with the square matrix A, 
and it is defined to be the number 



A 1 = on 



022 


023 


024 




032 


O33 


«34 


— 012 


042 


O43 


O44 





+ Oi3 



021 023 O24 

O31 O33 O34 

041 O43 044 

021 «22 024 

031 032 O34 — fli4 

«41 042 O44 



021 


O22 


023 


031 


O32 


O33 


041 


O42 


O43 



An inductive argument applied to Definitions 9-8 and 9-9 shows one way 
in which higher order determinants may be defined, but clearly our notation 
needs some simplification to avoid unwieldy expressions of the type given 
above. This is achieved by the introduction of the minor and the cof actor of 
an element of a square matrix. 

definition 9-10 (minors and cofactors) Let A be a square matrix of 
order (« x ri) with general element ay, and let | A | be the determinant of 
order n associated with A. Denote by A/y the determinant of order (n — 1) 
associated with the matrix of order (n — 1 , n — 1) derived from A by the 
deletion of row / and column j. Then My is called the minor of the element 
at) of A, and Ay = (— l) <+ ^My is called the cofactor of the element ay of A. 



Example 9-9 Find the minors and cofactors of the matrix 



SEC 9-3 



DETERMINANTS / 401 



"1 





3" 


2 


1 


4 


1 


2 


1. 



A = 



Solution The minor Mu is derived from A by deleting row 1 and column 1 
and equating Mn to the determinant formed by the remaining elements. 
That is, 



Mn = 



1 4 

2 1 






Similarly, minor Mi 2 is derived from A by deleting row 1 and column 2 and 
equating M\i to the determinant formed by the remaining elements. That is, 



M\i — 



2 4 
1 1 



= -2. 



Identical reasoning then shows that M\z = 3, M 2 i = — 6, M22 = —2, 
M23 = 2, M31 = — 3, M32 = —2, and M33 = 1. As the cofactors At} = 
(-l)<+'My, it follows that A n = -l,Ai2 = 2, An = 3, An = 6,^22 = -2, 
A23 — —2, Asi = —3, A32 = 2, and ^33 = 1. 

If A is a square matrix with general element ay and corresponding co- 
factor Atj, it is easily seen that: 

(a) if A is of order (2 x 2), then | A | = anAn + anA\%, 

(b) if A is of order (3 x 3), then | A | = a\\A\\ + CI12A12 + 013^13, 

(c) if A is of order (4 x 4), then | A | = auAu + CI12A12 + 013^13 + 
auAu- 

This suggests that if A is of order (« x «), then for | A | we could adopt the 
definition 



A I = auAu + CI12A12 + 



+ a\ n A\ n . 



(919) 

This is a true statement and could be accepted as a definition, but it is 
not the most general one which may be adopted. To see this we return to 
Eqn (9T8) and re-arrange the terms on the right-hand side to give 

I A I = a3l(ai2«23 — Al3«22) — «32(aiia23 — tfl3«2l) 

+ a%%(a\\a22 — 012(121). 
Hence, working backwards, we have 



A I = 031 



fll2 


Ol3 


— #32 


fl22 


#23 





flu 


#13 


+ #33 


#21 


#23 





#11 #12 
#21 #22 



402 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



thereby showing that it is also true that 

I A I = «31^31 + CI32A32 + 033^33. 



(9-20) 



We now have two equivalent but different looking expressions for | A | 
either of which could be taken as the definition of | A | . The expression in 
(b) above involves the elements and cofactors of the first row of A and the 
expression in Eqn (9-20) involves the elements and cofactors of the third row 
of A. A repetition of this argument involving other rearrangements of the 
terms of Eqn (9-18) shows that | A | may be evaluated as the sum of the 
products of the elements and their cofactors of any row or column of A. This 
very valuable and general result is known as the Laplace expansion theorem, 
and it is true for determinants of any order though we have only proved it 
for a third order determinant. Let us state this result formally as it would 
apply to a determinant of order n. 

theorem 9-5 (Laplace expansion theorem) The determinant | A ] associated 
with any (n X n) square matrix A is obtained by summing the products of 
the elements and their cofactors in any row or column of A. If A has the 
general element ay and the corresponding cofactor is Ay, then this result is 
equivalent to : 



Expansion by elements of a row 

n 

I A I = J dijAlj 
3 = 1 

for i=l,2,. . ., n; 

Expansion by elements of a column 

n 

I A I = 2 oyAij 

i=l 

fory'= 1,2,. . ., n. 

Example 9- 10 Evaluate the determinant 



Al = 



1 4 2 
3 -2 1 
1 5 2 



by expanding it (a) in terms of the elements of row 2, and (b) in terms of the 
elements of column 3. 



SEC 9-3 



DETERMINANTS / 403 



Solution 

(a) | A | = 

(b) | A | = 



3 


4 


2 


-2 


1 2 


- 1 


1 


4 




5 


2 




1 2 




1 


5 




3 


-2 




1 4 




1 


4 


2 






- 1 




+ 2 








1 


5 




1 5 




3 


-2 



= 5 



= 5. 



An important extension of Theorem 9-5 asserts that the sum of the 
products of the elements of any row (or column) of a square matrix A with 
the cofactors corresponding to the elements of a different row (or column) is 
zero. This is easily proved as follows. 

Let A be a matrix of order (n x «), and let B be obtained from A by re- 
placing row q of A by row/?. Then B has the elements of rows/? and q equal, 
so that by Theorem 9-4 (b) it follows that | B | = 0. Expanding | B | in 
terms of elements of row q by Theorem 9-5 we then find 



B | = ClpiAqi + dpzAqt + 



T Qpn-Aqn 



= 0, 



which was to be proved. A similar argument establishes the corresponding 
result for columns and so we have proved our assertion. 

theorem 9-6 The sum of the products of the elements of any row (or 
column) of a square matrix A with the cofactors corresponding to the ele- 
ments of a different row (or column) is zero. Symbolically, if a, ; - is the general 
element of A and Ay is its cofactor, then : 

Expansion by elements of a row 

n 

2 a P iAqi = 

i = l 

if p ^ q; and 

Expansion by elements of a column 

n 
2 OipAiq = 

i = l 

if p ^- q. 

Example 9-11 Verify that the sum of the products of the elements of 
column 1 and the corresponding cofactors of column 2 of the following 
matrix is zero : 



404 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



"1 


3 


2" 


4 


1 


2 


_3 


1 


3_ 



A = 



Solution The elements of column 1 are an = 1, 021 = 4, 031 = 3. The 
cofactors corresponding to the elements of the second column are A 12 = —6, 
A22 = — 3, A32 = 6. Hence 

aiiAu + (I21A22 + C131A32 = (l)(-6) + (4)(-3) + (3)(6) = 0. 

9 -4 Linear dependence and linear independence 

We are now in a position to discuss the important idea of linear independence. 
This concept has already been used implicitly in Chapter 4 when the three 
mutually orthogonal unit vectors i, j, and k were introduced comprising what 
in linear algebra is called a basis for the vector space. By this we mean that 
all other vectors are expressible in terms of the vectors comprising the basis 
through the operations of scaling and vector addition, but that no member 
of the basis itself is expressible in terms of the other members of the basis. 
Thus no choice of the scalars X, fi can ever make the vectors i and X\ + juk 
equal. It is in this sense that the unit vectors i, j, k comprising the basis for 
ordinary vector analysis are linearly independent, and obviously any other 
set of unit vectors a, b, c which are not co-planar, and no two of which are 
parallel, would serve equally well as a basis for this space. 

The same idea carries across to matrices when the term vector is inter- 
preted to mean either a matrix row vector or a matrix column vector. Thus 
the three column vectors 



Ci = 



c 2 = 



and C3 = 



are not linearly independent because C3 = Ci + 2C2, whereas the three row 
vectors 



Ri = [1 0], R 2 = [0 1 0], 



and 



R3 = [0 1] 



are obviously linearly independent, because no choice of the scalars A, fi can 
ever make the vectors Ri and AR2 + ^3 equal. It is these ideas that underlie 
the formulation of the following definition. 



definition 9T1 (linear dependence and linear independence) The set of 
n matrix row or column vectors Vi, V2, . . ., V» which are conformable for 
addition will be said to be linearly dependent if there exist n scalars ai, «2, 
. . ., a„, not all zero, such that 



SEC 9-4 



LINEAR DEPENDENCE AND LINEAR INDEPENDENCE / 405 



ociVi + a 2 V 2 + • • • + a„V„ = 0. 

When no such set of scalars exists, so that this relationship is only true 
when ai = 0C2 = • • • = «.„ = 0, then the n matrix vectors Vi, V2, . . ., V« 
will be said to be linearly independent. 

In the event that the n matrix vectors in Definition 9-11 represent the 
rows or columns of a rectangular matrix A, the linear dependence or inde- 
pendence of the vectors Vi, V2, . . ., V» becomes a statement about the 
linear dependence or independence of the rows or columns of A. In particular, 
if A is a square matrix, and linear dependence exists between its rows (or 
columns), then by definition it is possible to express at least one row (or 
column) of A as the sum of multiples of the other rows (or columns). Thus 
from Theorem 9-4 (f), we see that linear dependence amongst the rows 
or columns of a square matrix A implies the condition | A | =0. Similarly, 
if I A I t^ then the rows and columns of A cannot be linearly dependent. 

theorem 9-7 (test for linear independence) The rows and columns of a 
square matrix A are linearly independent if, and only if, | A | ^ 0. Conversely, 
linear dependence is implied between rows or columns of a square matrix 

A if I A I = 0. 



Example 912 Test the following matrices for linear independence between 
rows or columns: 



A = 



1 


4 3" 


2 


18 7 


4 


-6 1 



and 



B 



"1 


1 


0" 


3 


2 


1 


_1 


1 


3_ 



Solution We shall apply Theorem 9-7 by examining | A | and | B |. A simple 
calculation shows that | A | = 0, so that linear dependence exists between 
either the rows or the columns of A. In fact, denoting the columns of A by 
Ci, C 2 , and C 3 , we have C 2 = 2(C 3 — Ci). As | B | = — 3 the rows and 
columns of B are linearly independent. 

Let us now give consideration to any linear independence that may exist 
between the rows or columns of a rectangular matrix A of order (m X «). If 
r rows (or columns) of A are linearly independent, where r < min (w, «), 
then Theorem 9-7 implies that there is at least one determinant of order r 
that may be formed by taking these r rows (or columns) which is non-zero, 
but that all determinants of order greater than r must of necessity vanish. 
This number r is called the rank of the matrix A, and it represents the greatest 
number of linearly independent rows or columns existing in A. If, for example, 
A is a square matrix of order (« x n) and | A | ^ 0, this implies that the 
rank of A must be n. 



406 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



definition 9-12 (rank of a matrix) The rank r of a matrix A is the greatest 
number of linearly independent rows or columns that exist in the matrix A. 
Numerically, r is equal to the order of the largest order non-vanishing deter- 
minant [ B [ associated with any square matrix B which can be constructed 
from A by combination of r rows and r columns. 

Example 9-13 Find the rank of the following matrix: 
10 1 01 
•111-11 



A = 



-301-10 



Solution The largest order of determinant that can be constructed in this 
case from the rows and columns of A is 3. As there is certainly one such 
determinant that is non-vanishing, namely the one associated with the first 
three columns of A, the rank of A must be 3. The fact that other non-vanishing 
determinants of order three may be constructed from A is immaterial (e.g., 
take the last three columns). 

9-5 I nverse and adjoint matrix 

The operation of division is not denned for matrices, but a multiplicative 
inverse matrix denoted by A -1 can be denned for any square matrix A for 
which | A | t^ 0. This multiplicative inverse A~* is unique and has the pro- 
perty that 

A-iA = AA" 1 = I 

where I is the unit matrix, and it is defined in terms of what is called the 
matrix adjoint to A. The uniqueness follows from the fact that if B and C are 
each inverse to A, then B(AC) = (BA)C, so that BI = IC, or B = C. 

definition 913 (adjoint matrix) Let A be a square matrix, then the 
transpose of the matrix of cofactors of A is called the matrix adjoint to A, 
and it is denoted by adj A. A square matrix and its adjoint are both of the 
same order. 



Example 9-14 Find the matrix adjoint to: 



A = 



"1 


2 


r 


3 


1 





2 


1 


2_ 



Solution The cofactors Ay of A are: An = 2, A\% — —6, A\z = 1, A<i\ 
= —3, A 22 = 0, Avz = 3, Azx — — 1, Aii = 3, and A 33 = —5. Hence the 



SEC 9-5 



INVERSE AND ADJOINT MATRIX / 407 



matrix of cofactors has the form 

"2-6 1" 

-3 3 

-1 3 -5 

so that its transpose, which by definition is adj A, is 

" 2 -3 -1" 

adj A = -6 3 

_ 1 3 -5_ 

Now from Theorems 9-5 and 9-6, we see that the effect of forming either 
the product (adj A)A or the product A(adj A) is to produce a diagonal 
matrix in which each element of the leading diagonal is | A |. That is, we 
have shown that 



(adj A)A = A(adj A) = 



A 









A 

















(9-21) 



(9-22) 



whence 

(adj A)A = A(adj A) = | A 1 1. 
Thus, provided | A | ^ 0, by writing 

| A| 

we arrive at the result 

A _1 A = AA" 1 = I. (9-23) 

The matrix A -1 is called the matrix inverse to A and it is only defined fdr 
square matrices A for which | A | ^ 0. A square matrix whose associated 
determinant is non-vanishing is called a non-singular matrix. Although the 
inverse matrix is only defined for non-singular square matrices, the adjoint 
matrix is defined for any square matrix, irrespective of whether or not it is 
non-singular. 

definition 914 (inverse matrix) If A is a square matrix for which 
| A | ,£ 0, the matrix inverse to A which is denoted by A -1 is defined by the 
relationship 



408 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 

A-l = ^- 



Example 9-15 Find the matrix inverse to the matrix A of Example 9-14 
above. 

Solution It is easily found from the cofactors already computed that 
j A | = —9. This follows, for example, by expanding | A | in terms of ele- 
ments of the first row to obtain | A [ — (1)(2) + (2)(— 6) + (1)(1) = —9. 
Hence from Definition 9-14, we have 



A-i = ^ = (-l/9) 



2 -3 -1 

-6 3 

1 3 -5 



-2/9 1/3 1/9 

2/3 -1/3 

-1/9 -1/3 5/9. 



The steps in the determination of an inverse matrix are perhaps best 
remembered in the form of a rule. 

Rule 2 (Determination of inverse matrix) 

To determine the matrix A" 1 which is inverse to the square matrix A proceed 
as follows : 

(a) Construct the matrix of cofactors of A; 

(b) Transpose the matrix of cofactors of A to obtain adj A; 

(c) Calculate [ A | and, if it is not zero, divide adj A by I A I to obtain 
A- 1 ; 

(d) If | A | = 0, then A -1 is not defined. 

It is a trivial consequence of Definition 9-14 and the fact that for any 
square matrix A, | A | = | A' | (see Problem 9-34), that 

(A-i)' = (A')" 1 . (9-24) 

Also, if A and B are non-singular matrices of the same order, then 

(B-iA-^AB = J = AB(B- 1 A" 1 ), 

showing that 

(AB)- 1 = B^A 1 . (9-25) 

Accepting the result of Problem 9-35 as being valid for square matrices 
A, B of arbitrary order (« X «), so that | AB | = | A 1 1 B |, we are able to 
prove another useful result concerning the inverse matrix. If | A | ■# 0, then 
AA _1 = I showing that | AA _1 | = 1, or | A [| A -1 | = 1. It follows from 
this that: 



SEC 9-5 



INVERSE AND ADJOINT MATRIX / 409 



A I = 1/1 A-M. 



(9-26) 



One final result follows directly from the obvious fact that (A -1 ) - ^ -1 
= I, which is always true provided | A -1 | # 0. If we post-multiply this result 
by A we find 

(A- 1 )-^-^ = IA 
giving 

(A-i)-iI = A, 
whence 

(A- 1 )- 1 = A. (9-27) 

theorem 9-8 (properties of inverse matrix) If A and B are nort-singular 
square matrices of the same order, then : 

(a) AA- 1 = A"!A = I; 

(b) (AB)' 1 == B-U- 1 ; 

(c) (A-i)' = (A')" 1 ; 

(d) (A-i)-i = A; 

(e) | A | = 1/1 A-i I. 



Example 9-16 Verify that (A- 1 )' = (A')- 1 , given that 
"1 3" 
2 4 



Solution ' 


We have 




A- 1 = 


~-2 3/2" 
1 -I/ 2 . 


' 


so that 


(A' 1 )' = 


"-2 1 " 
.3/2 -1/2. 


However, 


A' = 


"1 

3 


2" 
4 


» 





410 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



SO that 

~-2 1 

(A')" 1 = 

.3/2 -l/2_ 

confirming that (A" 1 )' = (A')" 1 , 



9-6 Matrix functions of a single variable 

All the matrix results that have been obtained so far are equally valid whether 
applied to matrices whose elements are numerical Constants, or to matrices 
whose elements are functions of a single variable t. When the latter is the 
case it is convenient to copy the notation for a function used hitherto, and 
to represent the matrix by writing A(t). In many respects it is convenient to 
regard all matrices in this manner, since matrices with constant number 
elements correspond to the subset of all possible matrices A(f) in which all 
elements are constant functions. 

When the elements of A(t) are all differentiable with respect to t in some 
interval, it is reasonable to define a derivative of A(f) with respect to t, and 
for this purpose we shall work with the following definition. 

definition 9-15 (derivative of a matrix) Let A(/) be a matrix of order 
(m X n) whose elements ay(0 are all differentiable functions of t in some 
common interval to < t < t\. Then the derivative of A(i) with respect to t in 
t o < t < ti, written dAjdt, is defined to be the matrix of order (m X n) with 
elements day/d/. The matrix A(0 will be said to be differentiable in to < t 
< t\. Symbolically this result becomes: 



d^ 
dr 



a\\{i) ai 2 (0 



tf2l(0 022(0 



a m i(t) a m z(i) 



ai„(0 
«2n(0 

amn(t) 



dan 
At 


dai2 

dt ' 


dai n 

' ' ~dT 


6021 

dt 


dfl22 

~dT ' 


da 2n 
' dt 


da m x 
dt 


da m i 
dt ' 


dOmn 

' ' dt 



for t < t < h. 

Example 917 Find dA/df given that: 

"cosh t sin t cosh 2t~ 
A(t) = 

sinh t cos t sinh 2t_ 



SEC 9-6 



MATRIX FUNCTIONS OF A SINGLE VARIABLE / 411 



Solution From Definition 9-15 we have at once: 

"sinh t cost2 sinh It 
cosh t —sin t 2 cosh 2; 



dA 
dt 



for all t. 

If an(t) and bi]{t) are differentiable functions in some common interval 
t < t < ti, then we know from the work of Chapter 5 that 

d day dbu 

6t iatj±bii) = -dF ± ^' 

and so 



t- (fliifci^ + a<2^2; + • • • + a in bnj) = 
d? 



dan da« 2 , , , do<„ 



-( 

Consequently, it then follows directly from Definitions 9-3 to 9-6 that for 
suitably conformable matrices A and B: 



cl „ x dA dB. 

T (A ± B) = — ± — ; 

dt dt dt 

— (AA) = A — -, for any constant scalar A; 
dt dt J 



and 



d , 4 ™ dA„ dB 

- (AB) = — B + A — • 
dt dt dt 



(9-28) 
(9-29) 

(9-30) 



Notice that in general dA 2 /d? =£ 2A(dA/dr), for setting B = A in Eqn (9-30) 
yields 



dA 2 dA A dA 

dt dt dt 



(931) 



It also follows that if K is a constant matrix in the sense that its elements are 
constant functions of t, then 



dK 
d7 



= 0. 



(9-32) 



412 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



Using the results of Theorems 9-3 (d) and 9-8 (b) together with Eqn (9-30), 
we can derive two useful results. The first result applies to any two matrices 
A, B which are conformable for multiplication and is 



d d dB' dA' 

- ( AB)'-- ( <BX,= -A' + B'-; 



(9-33) 



the second result applies to any two non-singular square matrices which are 
conformable for multiplication and is 

a a dB^ 1 dA -1 

f- (AB)- 1 = f- (B-iA-i) = — - A-i + B-i -—• (9-34) 

dt dt dt dt 

We now. summarize these results in the form of a general theorem. 



theorem 9-9 (properties of matrix differentiation) Let A(?) and B(t) be 
suitably conformable matrices which are differentiable in some common 
interval to < t < ti, and let K be a constant matrix and X a scalar. Then 
throughout the interval to < t < h: 

d dA dB 

(a) dl (A + B) =d7 + d7 ; 

^ d , k m dA dB 

(b) dl (A - B)= dF-d7 ; 

(c) — (XX) = X — - ; (X a constant scalar) 
dt dt 

d dA dB 

(d) _ (4B) = _B + A-; 

dK 
(e) — — = 0; (K a constant matrix) 

d dB' dA' 

(f) dl (AB) ' = ^ A ' + B '^ : 

d dB^ 1 dA" 1 

®dl (AB) " 1= ^- A " 1 + B "^T' 

where A and B are non-singular matrices. 
Example 9-18 Verify Eqn (9-33) for the matrices 



A(0 = 



t 11 






"2 ?2 




and 


B(f) = 




-1 t 2 






; a l 



SEC 9-7 



SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 413 



Solution We have 

It + fi 1 + > 3 " 
/5_2 



AB = 

so that 

(AB)' = 
and thus 



P/ + /3 /5_ 2 - 

1 +? 3 



(AB)' = 



At 

Now, 

A'(0 = 



'2 + 3/ 2 5/ 4 " 
3/ 2 



1 -1 

J > 2 
so that 

'1 

_0 It 
Using these results we have 

TO 3? 2 ' 



and B'(/) = 



"2 f 3 " 
?2 1 



dA 
df 



and 



dJT 

At 



3/ 2 " 
2? 



dB' k , „, dA' 

— A' + B' — 
At At 



2f 



't -1" 

1 r 2 



+ 



'2 ? 3 " 

/ 2 1 



1 0" 
2t 



'3/ 2 3f«" 
2/ 2 -2/ . 

'2 + 3/ 2 5/ 4 " 
3f 2 



+ 



2 2?4" 
? 2 2/ 



-i(AB)'. 



9 - 7 Solution of systems of linear equations 

A system of m linear inhomogeneous equations in the n variables .vi, x%, 
. . ., x n has the general form 

flll-Vi + «i 2 .V2 + • • • + ainXn = k]_ 

A21-V1 + A22-V2 + • • • + at n Xn = k 2 (9-35) 



OmlXi + a m 2X2 + ■ • • + QmnXn — k m , 



414 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



where the term inhomogeneous refers to the fact that not all of the numbers 
k\, k->, . . ., k m are zero. Defining the matrices 

"A-i 

A 2 





"«n 


a\i . . . 


ay, ' 




■\"i 


A = 


021 


an . . . 


0211 


X = 


V2 




a m \ 


a,n2 ■ ■ ■ 


amn_ 




^n 


is system can 


be written 








AX = 


K. 











and 



K 



(9-36) 

Here A is called the coefficient matrix, X the solution vector, and K the 
inhomogeneous vector. 

In the event that m = n and | A | ^ it follows that A -1 exists, so that 
pre-multiplication of Eqn (9-36) by A -1 gives for the solution vector, 



A iK. 



(9-37) 



This method of solution is of more theoretical than practical interest because 
the task of computing A -1 becomes prohibitive when n is much greater than 
three. However, one useful method of solution for small systems of such 
equations (« < 4) known as Cramer's rule may be deduced from Eqn (9-37). 
Consideration of Eqn (9-37) and Definitions 9- 14 shows that x{, the ;'th 
element in the solution vector X, is given by 



(kiAu + A-2^2i + • ' • + k n A ni ) 



(9-38) 



for / = 1,2,. . ., «, where An is the cofactor of A corresponding to element 
fly. Using Laplace's expansion theorem we then see that the numerator of 
Eqn (9-38) is simply the expansion of | A< |, where At denotes the matrix 
derived from A by replacing the /th column of A by the column vector K. 
Thus we have derived the simple result 



for i'=l,2,.. ., n, 



(9-39) 



which expresses the elements of the solution vector X of Eqn (9-35) in terms 
of determinants. 



Rule 3 (Cramer's rule) 

To solve n linear inhomogeneous equations in n variables proceed as follows: 

(a) Compute | A | the determinant of the coefficient matrix and, if 



SEC 9-7 



SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 415 



| A | ^ 0, proceed to the next step; 

(b) Compute the modified coefficient determinants | A,- |, / = l, 2, . . ., 
n where A* is derived from A by replacing the ;'th column of A by the 
inhomogeneous vector K; 

(c) Then the solutions .vi, .\2, . . ., x n are given by 



for ;' = 1,2,. , ., n; 
(d) If | A | = the method fails. 



Example 919 Use Cramer's rule to solve the equations: 

Xl + 3.V2 + -Y3 =?= 8 
2-Vi + -V2 + 3a- 3 = 7 

•Yl + .Y2 — .Y3 =■ 2. 

Solution The coefficient matrix A and the modified coefficient matrices 
Ai, A2, and A3 are obviously: 



A~ 



A 3 



1" 
3 

-1. 

8' 

7 

2 



A 2 



1" 

3 

-1 



and 



Hence 



-Yl 



A I - 12, 
I Ad . 



-V2 = 



12, I A 2 I = 24, and | A3 

I A 2 1 I A 3 1 

— -r = 2, x 3 — 7-— 



12, so that 






In the more general case in which m = n, but | A | = 0, the inverse 
matrix does not exist and so any method using A" 1 must fail. In these cir- 
cumstances we must consider more carefully what is meant by a solution. In 
general, when a solution vector X exists whose elements simultaneously 
satisfy all the equations in the system, the equations will be said to be con- 
sistent. If no solution vector exists having this property then the equations 
will be said to be inconsistent. Consider the following equations : 

xi + x a + 2x3 = 9 
4yi — 2x 2 + x 3 = 4 
5^1 — X2 + 3xs =» 1. 



416 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



These equations are obviously inconsistent, because the left-hand side of the 
third equation is just the sum of the left-hand sides of the first two equations, 
whereas the right-hand sides are not so related (that is, 1 ^ 9 + 4). In effect, 
what we are saying is that there is a linear dependence between the rows of 
the left-hand side of the equations which is not shared by the inhomogeneous 
terms. The row linear dependence in the coefficient matrix A is obviously 
dependent upon the rank of A and we now offer a brief discussion of one way 
in which the general problem of consistency may be approached. 

Obviously, when working conventionally with the individual equations 
comprising (9-35) we know that: (a) equations may be scaled, (b) equations 
may be interchanged, and (c) multiples of one equation may be added to 
another. This implies that if we consider the coefficient matrix A of the system 
and supplement it on the right by the elements of the inhomogeneous vector 
K to form what is called the augmented matrix, then these same operations 
are valid for the rows of the augmented matrix. Clearly, the rank will not be 
affected by these operations. If the ranks of A and of the augmented matrix 
denoted by (A, K) are the same, then the equations must be consistent; 
otherwise they must be inconsistent. 

definition 9-16 (augmented matrix and elementary row operations) 
Suppose that AX = K, where 



A = 



an 


ai2 . 


■ am 




Xl 


021 


022 • 


■ Ozn 


, x = 


X2 


a n \ 


a n i ■ 


Qnn. 




Xn 



and K = 



'k{ 

kz 



Then the augmented matrix, written (A, K), is defined to be the matrix 
"on Ol2 • • 

«21 022 ■ ■ 



(A,K) 



a n \ a n 2 



a in 


ki~ 


«2» 


k2 


Onn 


. kn_ 



An elementary row operation performed on an augmented matrix is any 
one of the following: 

(a) scaling of all elements in a row by a factor 1\ 

(b) interchange of any two rows ; 

(c) addition of a multiple of one row to another row. 

An augmented matrix will be said to have been reduced to echelon form by 
elementary row operations when the first non-zero element in any row is a 
unity, and it lies to the right of the unity in the row above. 



SEC 9-7 



SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 417 



Example 9-20 Perform elementary row operations on the augmented 
matrix corresponding to the inconsistent equations above to reduce them to 
echelon form. Find the ranks of A and (A, K). 

Solution The augmented matrix 

"1 12 9" 

(A, K) = 4 -2 1 4 

5 -1 3 1_ 

Subtract from the elements of row 3 the sum of the corresponding elements 
in rows 1 and 2 to obtain 



1 


1 2 


9 


4 


-2 1 


4 








-12 



Subtract from the elements of row 2, four times the corresponding elements 
in row 1 to obtain 



'1 


1 


2 


9 





-6 


-7 


-32 











-12 



Divide row 2 by —6 and row 3 by —12 to obtain 

"112 9 ' 

1 7/6 16/3 

1 

This is now in echelon form and the rank of the matrix comprising the first 
three columns is 2, which must be the same as the rank of the coefficient 
matrix A. The rank of (A, K) must be the same as the rank of the echelon 
equivalent of the augmented matrix which is clearly 3. 

The general conclusion that may be reached from the echelon form of an 
augmented matrix (A, K), is that equations are consistent only when the ranks 
of A and (A, K) are the same. If the equations are consistent, and A is of 
order (n x n) and the rank r < n, we shall have fewer equations than vari- 
ables. In these circumstances we may solve for any r of the variables xi in 
terms of the n — r remaining ones which can then be assigned arbitrary 
values. 



theorem 910 (solution of inhomogeneous systems) The inhomogeneous 



418 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



system of equations 

AX = K, 

where A is of order (n x n) and X, K are of order (« x 1) has a unique solu- 
tion if | A | 9^ 0. If | A | =0, then the equations are only consistent when the 
ranks of A and (A, K) are equal. In this case, if the rank r < n, it is possible to 
solve for r variables in terms of the n — r remaining variables which may then 
be assigned arbitrary values. 

Example 9-21 Solve the following equations by reducing the augmented 
matrix to echelon form : 

Xl + 3X2 — X3 = 6 
8xi + 9x 2 + 4x 3 = 21 
2xi + x 2 + 2x3 = 3. 

Solution The augmented matrix 

"1 3 -1 61 

(A,K)= 8 9 4 21 

.2 1 2 3_ 

Subtract from the elements of row 2, the sum of three times the corres- 
ponding element in row 3 and twice the corresponding element in row 1 to 
obtain 

"1 3 -1 61 



2 1 2 3_ 
Interchange rows two and three to obtain 
"1 3 -1 6" 



2 1 




2 3 




Subtract twice row 1 from row 2 and divide the resulting row 2 by —5 to 
obtain 

~1 3 -1 6 

1 -4/5 9/5 



This is now in echelon form and clearly the ranks of A and (A, K) are both 2 



SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 419 

showing that the equations are consistent. However, only two equations 
exist between the three variables xi, X2, and X3, for the echelon form of the 
augmented matrix may be seen to be equivalent to the two scalar equations 

4 9 

x\ + 3x2 — X3 = 6 and xz — - X3 = -• 

Hence, assigning xz arbitrarily, we find that 

3 7 J 9 4 

xi = - — - X3 and x 2 = - + - xz. 

When the inhomogeneous vector K = 0, the resulting system of equations 
AX = is said to be homogeneous. Consider the case of a homogeneous 
system of n equations involving the n variables xi, x 2 , . . ., x n . Then it is 
obvious that a trivial solution xi — x 2 = • • • = x n = corresponding to 
X = always exists, but a non-trivial solution, in the sense that not all 
xu x 2 , . . ., x n are zero, can only occur if | A | = 0. To see this notice that if 
I A I ^ then A" 1 exists, so that premultiplication of AX = by A -1 gives 
at once the trivial solution X = as being the only possible solution. 
Conversely, if | A [ = 0, then certainly at least one row of A is linearly 
dependent upon the other rows, showing that not all of the variables xi, 
x 2 , . . ., x n can be zero. 

When a non-trivial solution exists to a homogeneous system of n equa- 
tions involving n variables it cannot be unique, for if X is a solution vector, 
then so also is AX, where A is a scalar. As in our previous discussion, if the 
rank of A which is of order (« x «) is r, then we may solve for r of the vari- 
ables xi, x 2 , . . ., x„ in terms of the n — r remaining ones which can then be 
assigned arbitrary values. 

theorem 9-11 (solution of homogeneous systems) The homogeneous 
system of equations 

AX = 0, 

where A is of order (n X n) and X, are of order (n x 1) always has the 
trivial solution X = 0. It has a non-trivial solution only when | A | = 0. If 
A is of rank r < n, it is possible to solve for r variables in terms of the n — r 
remaining variables which may then be assigned arbitrary values. If X is a 
non-trivial solution, so also is AX, where A is an arbitrary scalar. 

Example 9-22 Solve the equations 

X\ — X2 + Xz = 

2xi + X2 — xz = 
xi + 5X2 — 5X3 = 0. 



420 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



Solution There is the trivial solution xi = X2 = xz — and, since the 
determinant associated with the coefficient matrix vanishes, there are also 
non-trivial solutions. The augmented matrix is now 



(A, 0) = 



1 -1 1 0" 

2 1-10 
1 5-5 



which is easily reduced by elementary row transformations to the echelon 
form 

1 -1 1 0" 

1-10 

.0 

This shows that there are only two equations between the three variables 
jci, X2, and xs, for the echelon form of the augmented matrix is seen to be 
equivalent to the two scalar equations 



X\ — X2 + X3 = 



and 



X2 — X3 = 0. 



Hence, assigning xz arbitrarily, we have for our solution xi = and x% = 
X3 = k (say). 

A practical numerical method of solution called Gaussian elimination is 
usually used when dealing with inhomogeneous systems of n equations 
involving n variables. This is essentially the same method as the one described 
above for the reduction of an augmented matrix to echelon form. The only 
difference is that it is not necessary to make the first non-zero element 
appearing in any row in the position corresponding to the leading diagonal 
equal to unity. We illustrate the method by example. 

Example 9-23 Solve the following equations by Gaussian elimination: 

Xl — X2 — X3 = 
3xi + X2 + 2x3 = 6 
2xi + 2x2 + X3 = 2. 

Solution The augmented matrix 

"1 -1 -1 01 
(A,K)= 3 1 2 6 
2 2 1 2_ 
Subtracting three times row 1 from row 2 and twice row 1 from row 3 gives 



SEC 9-8 



EIGENVALUES AND EIGENVECTORS / 421 







- 1 - 1 
4 5 6 



4 3 2_ 

Subtraction of row 2 from row 3 gives 

1-1-1 01 

4 5 

-2 

The solution is now found by the process of 'back-substitution' using the 
scalar equations corresponding to this modified augmented matrix. That is, 
the equations 

•\"1 — -V2 — A'3 = 

4.V2 + 5.v 3 = 6 
- 2.v 3 = -4. 

The last equation gives A'3 = 2 and, using this result in the second then gives 
-Y2 = — 1. Combination of these results in the first equation then gives 

-V! = 1. 

It is not proposed to offer more than a few general remarks about the 
solutions of m equations involving n variables. If the equations are con- 
sistent, but there are more equations than variables so that m > n, it is clear 
that there must be linear dependence between the equations. In the case that 
the rank of the coefficient matrix is equal to n there will obviously be a 
unique solution for, despite appearances, there will be only n linearly inde- 
pendent equations involving n variables. If, however, the rank is less than n 
we are in the situation of solving for r variables xu x%, . . ., in terms of the 
remaining n — r variables whose values may be assigned arbitrarily. In the 
remaining case where there are fewer equations than variables we have 
m < n. When this system is consistent it follows that at least n — m variables 
must be assigned arbitrary values. 

9 -8 Eigenvalues and eigenvectors 

Let us examine the consequence of requiring that in the system 

AX = K, (9-40) 

where A is of order (n x n) and X, K are of order (n x 1 ), the vector K is 
proportional to the vector X itself. That is, we are requiring that K = AX, 
where X is some scalar multiplier as yet unknown. This requires us to solve 
the system 

AX = AX, (9-41) 



422 / LINEAR tRANSFORMATIONS AND MATRICES 



CH 9 



which is equivalent to the homogeneous system 

(A - AI)X = 0, (9-42) 

whete I is the unit matrix. 

Now we know from Theorem 91 1 that Eqn (9-42) can only have a non- 
trivial solution when the determinant associated with the coefficient matrix 
vanishes, so that we must have 



A - XI I = 0. 



(9-43) 



When expanded, this determinant gives rise to an algebraic equation of 
degree n in X of the form 



X n + ociA"- 1 + <x 2 X n ~ 2 + 



+ *n = 0. 



(9-44) 



The determinant (9-43) is called the characteristic determinant associated with 
A and Eqn (9-44) is called the characteristic equation. It has n roots X\, Xo, 
. . ., X„, each of which is called either an eigenvalue, a characteristic root, or, 
in some texts, a latent root of A. 

Example 9-24 Find the characteristic equation and the eigenvalues 
corresponding to 



A = 



Solution We have 
A- XI 

so that 

I A - XI I = 



"1 2" 




"1 0" 






- X 




= 


.3 0. 




-0 1. 





1 - X 2 

3 -X 



1 - X 
3 



= ^ _ X - 6. 



Thus the characteristic equation is 
A2 - X - 6 = 0, 

and its roots, the eigenvalues of A, are X = 3 and X = —2. 

No consideration will be given here to the interpretation that is to be 
placed on the appearance of repeated roots of the characteristic equation, 
and henceforth we shall always assume that all the eigenvalues (roots) are 
distinct. 

Returning to Eqn (9-42) and setting X = Xi, where Xt is any one of the 
eigenvalues, we can then find a corresponding solution vector X< which, 
because of Theorem 9T1, will only be determined to within an arbitrary 



SEC 9-8 



EIGENVALUES AND EIGENVECTORS / 423 



scalar multiplier. This vector X« is called either an eigenvector, a characteristic 
vector or, a latent vector of A corresponding to fa. The eigenvectors of a 
square matrix A are of fundamental importance in both the theory of matrices 
and in their application, and some indication of this will be given later in, 
Section 15-8. 



Example 9-25 Find the eigenvectors of the matrix A in Example 9-24. 

Solution Use the fact that the eigenvalues have been determined as being 
2 = 3 and X = — 2 and make the identifications fa = 3 and fa = —2. Now 
let the eigenvectors Xi and X2, corresponding to fa and fa, be denoted by 



Xi = 



*i" 



xi 



(i) 



and 



X 2 = 



*i l 



*2 1, 



Then for the case k = fa, Eqn (9-42) becomes 



= 0, 



"(1-3) 2 1 Vx\™' 

3 (0 - 3)J |_x 2 (1) . 

whence 

-2xi (1 » + 2x2 (1) = and 



3xi (1 > - 3x 2 m = 0. 



These are automatically consistent by virtue of their manner of definition, 
so that we find that xi ll) = ,V2 (2) . So, arbitrarily assigning to xi a) the value 



Xl 



(i) 



1 , we find that the eigenvector Xi corresponding to fa = 3 is 
T 



Xi = 



A similar argument for A = fa gives 



X\ 



x^ 



(2)" 



= 0, 



"(1 +2) 2 ' 

_ 3 (0 + 2). 

whence 

3xi (2 > + 2.x 2 (2) = 0. 

Again, arbitrarily assigning to xi l2) the value xi {2) = 1, we find that X2 i2) = 
— 3/2. Thus the eigenvector X2 corresponding to fa = —2 is 

1" 
3 
~2J 
Obviously ^Xi and /j,X% are also eigenvectors for any arbitrary scalar /u. 



424 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



9-9 Linear transformations 

Any introductory account of matrices would be incomplete were the basic 
idea of a linear transformation not to be mentioned. Some discussion of this 
important concept has already been offered in Section 9-1, and we now 
develop the idea a little further. Indeed, to recapitulate briefly, it was ex- 
plained there how a linear transformation is just a simple form of mapping of 
the points of one plane into the points of another. This idea is still useful 
when a matrix vector X of order (n X 1) is mapped by a matrix transforma- 
tion into what is called its image X under the transformation. In this context 
the elements of X are usually considered to be the components of a vector 
in an n-dimensional space, so that X then specifies a point in that space, 
and X is its image point under the linear transformation. We propose to 
work with the following straightforward definition of such a transformation. 

definition 9T7 (linear transformation) A general linear transformation 
or point transformation of the vector X of order (n X 1) into the image X of 
order (n X 1) is defined to be a transformation of the form 

X = AX + K, 

where the coefficient matrix A is of order (« X n) and the vector K is of order 
(n X 1). 



The special case considered in Section 9T involved a mapping of points 
of the plane brought about solely by a rotation of the plane through an angle 
6 about the origin. In that case the transformation corresponded to K = 0, 
and 



A = 



"cos 6 —sin 
sin 6 cos 



(9-45) 



This matrix is called an orthogonal matrix because it has the property that 
A' = A -1 , and it is representative of a very important class of square matrices. 
The first row of A is seen to contain the direction cosines of Ox' with respect 
to Ox and Oy, whilst the second row contains the direction cosines of Oy' 
with respect to Ox and Oy. 

More generally, consider the rectangular axes 0{xi, X2, X3} which are 
arbitrarily rotated about origin O to form the axes system 0{xi', X2', X3'}, in 
which the direction cosine of Ox/ with respect to Oxj becomes j>y. Then the 
matrix 



Vll 


V\2 


V13 


vzi 


V22 


V2Z 


vzi 


V32 


V33_ 



(9-46) 



SEC 99 



LINEAR TRANSFORMATIONS / 425 



is strictly analogous to matrix (9-45), and it is easily seen that X and X are 
related by 



X = AX. 



(9-47) 



In the special case that the rotation is only about the ^3-axis through an 
angle 8 in the sense shown in Fig. 9-1, then ri3 = i>3i = V32 = V23 = and 
V33 = 1, and 



A = 



cos u 

sin 6 





sin 6 


0" 


cos 6 








1_ 



(9-48) 



When discussing an application of a linear transformation to the theory of 
elasticity in the next section we shall have occasion to refer to this matrix 
again. 

Aside from the rotation transformation characterized by Eqns (9-46) 
and (9-47) there are three other simple transformations worthy of note and 
these are listed below. It is left as an exercise for the reader to verify their 
main properties when related to the plane which give rise to their names. 

1. The identity transformation This is the transformation X = X, and it 
corresponds to the case K = and A = I. Under this transformation X and 
its image X are coincident. 

2. The translation transformation This is the transformation X = X + K, 
and it corresponds to an arbitrary non-zero vector K and A = I. The effect 
of the transformation is to translate X to its image X, without rotation or 
change of scale. 

3. Dilatation transformation This is a transformation X = AX, in which A 
is a non-singular diagonal matrix. Its effect when mapping X into X is to 
change the scale of the different elements of X without translation or rotation. 
In the special case that all the diagonal elements are equal say to X, where 
A > 1, its effect is one of a magnification of X. 



Example 9 26 If 

x 



X = 



y. 



x = 



x 

y'A 



and 



A = 



'5 0" 
2 



deduce the image of the curve y = sinh x under the transformation 
X = AX. 



Solution We have 



426 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 





X 




"5 0" 


x~ 






y. 




.0 2 


.y. 


j 


so that 


x' = 5x and y' = 2y. 


Thus the image curve of y = sinh x is given parametrically by 


x = 5x and y' = 2 sinh x 


or, equivalently, by 


; 




2 sinh (x'/5). 





9-10 Applications of matrices and linear transforma- 
tions 

It is the object of this final section to indicate a few of the diverse applications 
of the work of this chapter. Of necessity, we will be able to do no more than 
outline this large and fruitful field of study, and for our first example we 
look to the notion of rank to enable us to prove an important result in 
dimensional analysis known as the Buckingham Pi theorem. 

910(a) Application of rank to dimensional analysis — Buckingham Pi 
Theorem 

In many branches of engineering and science, a valuable method of approach 
to difficult problems is via the method of dimensional analysis touched on 
briefly at the start of Chapter 5. In essence, this method seeks first to char- 
acterize a physical situation by forming dimensionless groups from the 
variables involved, and then to determine the functional relationships which 
relate these dimensionless groups. Our contribution will be to the first part 
of this process, for we shall determine how many dimensionless groups exist. 
Let us suppose that a physical situation is described by n variables 
Ki, W2, . . ., u„, each of which corresponds to a physical quantity. Suppose 
also that each of these quantities is capable of expression dimensionally in 
terms of length [L], mass [M], and time [T], and that m has dimensions 

[L} ai [M} bi [TJ\ 

Then the product of powers 

mi* 1 uj* . . . u n \ (9-49) 

where k\, k^, . . ., k n are real numbers, must have dimensions 

Such products of powers will be dimensionless, in the sense that they are 



SEC 9-10 



APPLICATIONS OF MATRICES / 427 



pure numbers having dimensions 
[L]°[M]°[7T, 

only if 

aiki + a 2 k% + • ■ ■ + a n k„ = 
biki + b 2 k 2 + • ■ • + b„k n = 
c\k\ + c 2 k 2 + • • • + c n k n = 

or, equivalently, if 



fli a 2 



bi b 2 



ci c 2 





~k{ 


an 


k 2 


b n 




Cn_ 






_k n _ 



= 0. 



(9-50) 



Now if the rank of the coefficient matrix of order (3 x n) in Eqn (9-50) 
is r, then we know from the work of Section 9-7 that it is possible to express 
n — r of the variables k\, k 2 , . . ., k„ in terms of the remaining r variables. 
That is to say, it will be possible to form n — r dimensionless quantities 
7ti, 7T 2 , . . ., TTn-r from the n variables u\, u 2 , . . ., u n . The dimensionless 
variables 77, are called Pi-variables. Hence we have proved the following 
result. 

theorem 9T2 (Buckingham Pi theorem) Let a physical situation be 
capable of description in terms of n physical quantities wi, u 2 , . . ., u n , where 
ut has dimensions [L] ai [M] 6i [77\ Then, if r is the rank of the matrix 



«1 «2 

b\ b 2 
a c 2 



a n 
b n 



the physical situation is capable of description in terms of n — r dimension- 
less variables m, tt 2 , . . ., ir n - r formed from the variables m, u 2 , . . ., u„. 

This is best illustrated by example. In the slow viscous flow of a fluid 
between parallel planes, some functional relationship of the form 

V=f{k,d, n ) 

exists between the average flow velocity V, the pressure gradient k along the 
flow, the distance d between the planes and the viscosity tj. The dimensions 
of these quantities which will form the matrix in the Buckingham Pi theorem 
are shown in the table below: 



428 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 





V 


k 


d 


n 


L 


1 


-2 


1 


-l 


M 





1 





l 


T 


-1 


-2 





-i 



The rank of the (3 x 4) matrix whose elements comprise the entries in 
this table is 3, as may be seen, for example, by using elementary row 
operations to reduce it to its echelon equivalent 

"1 -2 1 -r 







1 







1 







in which the determinant formed from the first three columns is non-zero. 

Thus, from the conditions of the theorem, the number of v variables is 
4 — 3 = 1. A dimensionless grouping in this case is kd 2 jr]V, and any product 
of powers of the form shown in (9-49) must be a power of this one dimension- 
less group. Hence this physical problem is capable of description in terms of 
the one dimensionless grouping -n = kd 2 lr)V. As the velocity profile across 
the flow only depends on the distance x from one of the walls, our result 
implies that all such flows will be characterized by one curve describing the 
variation of -n with x/d. 

9-10 (b) Differentials as linear transformations 

We now consider a generalization of the total differential as described in 
Theorem 519 and subsequently used in Theorem 5-22. Let us suppose that 

«1 = /l(*l, X2, . . ., x„) 

M2 = fc(xi, x 2 , . . ., x„) 



(9-51) 



Itn =fn(xi, X2, . . ., X n ) 

then it follows from Theorem 5T9 and the properties of matrices that 

~ 8 A d A 8 A~ 

dx\ dx% ' ' dx n 



dwi 



dw 2 



du n 



a/2_ 

8x\ 


8x2 


8fi_ 
8X U 


3/» 

8x\ 


3/» 
8x2 


8x„ 



dxi 



d*2 



dx„ 



(9-52) 



SEC 9-10 



APPLICATIONS OF MATRICES / 429 



This can be written 
du = A dx 



(9-53) 



by identifying du, dx with the (n x 1) column vectors in Eqn (9-52) in the 
obvious manner, and A with the (n X n) matrix of partial derivatives. 
Viewed in this light, Eqn (9-53) may be seen to be a local linear transformation 
mapping dx into du. The adjective local is used here because the transforma- 
tion will only be a linear transformation when A is a constant matrix, and as 
the elements of A are functions of x\, X2, . ■ ., x n , they can only be approxi- 
mated by constants in the neighbourhood of any fixed point P with co- 
ordinates {xi F , X2 P , . . ., x n v ). For different points P, the transformation A 
will be different, showing that Eqn (9-53) represents a more general type of 
transformation than a general linear point transformation. 

Transformation (9-53) will be one-to-one provided that A -1 exists, for 
then a unique inverse mapping 



dx = A- 1 dx 



(9-54) 



will exist. The condition for this is, of course, that | A | + at the point P. 
This will be recognized as the non-vanishing Jacobian condition already 
encountered in Chapter 5. 



Fig. 9-2 Spherical polar coordinates. 




By way of example, consider the relationship between the spherical polar 
coordinates (r, <f>, d) and the Cartesian coordinates (x, y, z) illustrated in 
Fig. 9-2 and described by 

x = r sin 6 cos <f> 
y = r sin 6 sin <f> 
z = r cos 6. 

Making the identifications u\ = x, ui = y, u% = z, and xi = r, x% = 6, 



430 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



X3 = <f>, a simple calculation shows that Eqn (9-52) will take the form 



dx' 
dy 
dz 



sin 6 cos <f> r cos d cos <j> — r sin d sin <£" 
sin sin <^ r cos sin ^ r sin 6 cos <£ 
cos 6 —r sin 



"dr 
d0 



(9-55) 



Denoting the square matrix in Eqn (9-55) by A, it is easily established that the 
Jacobian determinant | A | = r 2 sin 0. Calculating the inverse matrix A -1 
and using it to deduce the inverse mapping we have, provided r 2 sin =£ 0, 
that 



dr' 
d0 

J4>. 



r<- sin i 



r 2 sin 2 cos <f> r 2 sin 2 sin <f> r 2 sin cos 0" 
r sin d cos 9 cos <£ r sin 6 cos sin <£ — r sin 2 
— rsin<£ rcos<f> 



dx 

d/ 
.dz. 
(9-56) 



910 (c) Linear transformation of the stress tensor 

In the mathematical theory of elasticity it is useful to introduce the concept 
of the stress vector associated with any plane element of area within a solid 
body. The magnitude of the stress vector is the force per unit area acting on 
that plane element of area, and its sense is the sense of the force which is 
exerted on that element located at point P, say, by the surrounding material. 
In a solid, unlike a liquid, this force depends on the orientation of the element 
of area, and it is convenient to describe the situation at point P by considering 
elements of plane area normal to each of the unit vectors xi, X2, X3 of a 
rectangular Cartesian system 0{x\, X2, X3}. If the components in the xi, X2, 
and X3 directions of the stress acting on the element of area with x* as its 
normal are t*i, t*2, and t* 3 , then the complete information concerning the 
components of stress acting on all three mutually orthogonal elements of 
area at P will be contained in the following table : 

Stress Components at P 





1 


2 


3 


Surface Normal to xi 


Til 


T12 


T13 


Surface Normal to X2 


T21 


T22 


T23 


Surface Normal to x% 


T31 


T32 


T33 



In general there will be a different table of this type for each point P in the 
solid. 



SEC 9-10 



APPLICATIONS OF MATRICES / 431 



The matrix T defined by 

Til T12 T13 

T = T21 T22 T23 

_T31 T32 T33. 

is called the stress tensor at the point P, and it is fundamental to the develop- 
ment of the mathematical theory of elasticity. Let us now indicate how the 
stress tensor transforms when axes centred on P are rotated, since this is a 
situation of considerable practical importance, being related to the deter- 
mination of the directions of minimum and maximum stress at any point in 
a solid body. 




Fig. 9-3 Rotation 6 about *3-axis. 



For this purpose we shall assume that no external moments act on the 
body, for then it can be shown that T is symmetric. In addition, we will set 
T13 = T23 = T33 = which characterizes what is called a. plane state of stress, 
since all the forces then lie in the (xi, X2)-plane. The appropriate rotation 
matrix A relating the system 0{xi, xi, x%} to 0{xi', x 2 ', *3'} when a rotation 
6 about the X3-axis has been made is that given in Eqn (9-48) (see Fig. 9-3). 

Hence, setting 



F = AT, 



(9-57) 



then the elements of row i of F will contain the components of the trans- 
formed force vector acting on the element of area with Ox/ as normal. To 
relate this result to the stress components t</ relative to the new axes 
0{xi', X2', X3'}, we must use the fact that rt/ is equal to the projection of the 
force acting on the element of area normal to Ox/ along Ox/. To achieve 
this result by matrices we must post multiply A by the transpose F' of F. 
This is so because row 1 of A contains the direction cosines of axis Ox/ and 
row j of F contains the components of the force acting on the element of 
area with Ox/ as normal, and the rule for matrix multiplication is 'rows into 



432 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



columns'. Thus if T is the transformed stress tensor, then 

f = AF' = AT A', 
but as T is symmetric, T' = T, giving 

f = ATA'. 

Using the fact that 



T = 



Til 


T12 


0" 


T12 


T22 





.0 





0. 



(9-58) 
(9-59) 



when evaluating the indicated matrix products in Eqn (9-59) then shows that 
the stress components of the transformed stress tensor are 

Tn' =* tii cos 2 6 + T22 sin 2 — 2ti2 sin cos d 

T22' = tii sin 2 6 + T22 cos 2 6 + 2ti 2 sin 6 cos (9-60) 

T12' = (th — T22) sin cos 6 + ri2(cos 2 — sin 2 0) 
with 

T13' = T23' = T33' = 0. 

These results form the basis of many important studies involving plane 
stress in solids on which no external moment is acting. 



PROBLEMS 

Section 91 

91 Suggest two physical situations in which the outcomes may be displayed in 
the form of a matrix. 

9-2 Find the sum A + B and difference A — B of the matrices 





" 1 2 


3 


4 " 




" 2 3 


1 


2 " 








A = 


2 12 2 
.12 0. 


, B = 


2 2 
. 1 -2 1 1 . 








9-3 Evaluate the following inner products: 








"1" 




"2" 






" 2" 


(a) [2 113] 


2 
2 


; (b) [1 -2 7 4] 


3 



; (c) [2 -1 


3 1] 


-1 
3 






-1- 








-1- 








- 1. 



9-4 Evaluate the following matrix products: 



3 12" 


"1 


2" 






1 


-1 




12 2 2 








; (b) 


1 1 1 0_ 


-1 


1- 





(a) 



9-5 State which of the following forms of matrix product are defined and, where 
appropriate, give the shape of the resulting product matrix: 







PROBLEMS 


/ 433 


-1 


2 1 


3" 


~-i r 

2 2 






1 


-1 1 


-1 


1 1 






1 





1. 


- 3 2- 







(a) (7 x 3)(3 x 9); 
(c) (1 x 9)(9 x 1); 

9-6 If the matrices I, A, and B are given by 



(b) (5 x 3X2 x 3); 
(d) (3 x 1X1 x 4). 



1 = 



"1 0" 




"2 1 3" 


1 


, A = 


1 2 1 


.0 1_ 




.5 1 4. 



and B = 



1 





2 


1 


-1 


3 


1 





2 


1 


2 


1 



show that 

(a) IA = AI = A; 

(b) I B = B but that B I is not defined. 

9-7 Give an example of matrices A and B for which: 

(a) the product A B is defined but the product B A is not; 

(b) the products A B and B A are both defined but are matrices of different 
order ; 

(c) the products A B and B A are both defined and are the same order as 
A and B, but they are not equal. 

9-8 Display each of the following sets of simultaneous equations in matrix form: 

(a) 2x + 4/ + z = 9 

x - 3y + 2z = -4 
x + y — z = 1, 

(b) w + 2x - y = 4 

x - 3y + 2z = - 1 
2w + 5x - 3z = 
Aw — y + Az = 2, 

(c) 3w + x-2y + 4z=l 

w — 3x + y — 3z = 4 
w + Ix + 2y + 5z = 2, 

(d) 2x + y — z = Ax 
3x + 2y + Az = Ay 

x - 3y + 2z = Iz. 

9-9 Let matrices R„ and R^, be defined as follows: 

cos 8 -sin 81 r CO s 4. -sin f 

■ „ „ and R * = 

_sm cos 0J v [sin <f> cos <f> 



R„ = 



434 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



and let 
X = 



~x~ 


, X' = 


V" 


-y- 




-/- 



and X' 



-El- 



Then, if 

X' = R S X and X" = R^X', 

show by matrix multiplication and use of trigonometric identities that 

R#[RflX] = Rg+^X. 
Interpret this result geometrically. 
Section 9-2 

9-10 Construct the matrices of order (3 X 2) whose general element atj has the form : 

(a) aij = / 2 + f - 2ij; 

(b) at) «= sin iO cosjd. 

9-11 State which of the following pairs of matrices can be made equal by assigning 
suitable values to the constants a, b, and c. Where appropriate, determine what 
these values must be. 

fa) 



(b) 



"1 2 1 


0" 




"1 2 


1 0" 




3 a b 


2 


and 


3 1 


2 2 


» 


1 2 c 


1_ 




.1 2 


4 1. 




1 5 a 


2" 




"1 5 


1 2" 




2 «2 3 


b 


and 


2 4 


3 4 


> 


.4 3 2 


c_ 




.4 3 


2 1. 




1 


(a + b) 3 " 




"1 4 3" 


(fl + c) 


2 4 


and 


2 4 


1 




2 (b 


+ c)_ 




.1 2 2. 



(c) 



9-12 Find the numbers a, b, c, and d in order that the following matrix equation 
should be valid: 

'6 4 6" 

6 -1 -2 

.3 6 6. 

9-13 Use Definitions 9-3 and 9-4 to prove that if ?., n are scalars and matrices A 
and B are conformable for addition, then 

(a) A(A + B) = M. + AB, 

(b) AA + M = (A + /*)A. 

9-14 Determine 3 A + 2B and 2A — 6B given that 

"2 - 



2a 1 


5" 




"2 3 r 




3 2 


-b 


+ 


3 </ 4 


= 


3c 4 


1. 




.3 2 5. 





_ ri 37' 

~ L2 -1 6 



and B = 



1 *]• 

3 2j 



PROBLEMS / 435 



915 If 



and 



D 



'2 1 
3 2 
1 



B = 



1 1 0" 



1 1 

3 1 









"2 3 4~ 


, c = 






1 5 6 







find the matrix products A B and C D. 

9-16 This example shows that the matrix product A B = does not necessarily 
imply either that A = or that B = 0. If, 

T 1 -1 11 



-3 2 -1 



-2 



1 



oJ 



and B = 



"1 2 
2 4 
.1 2 



find A B and B A and show that AB^BA. 
9-17 Show that the matrix equation 
AX = K, 
where 



A = 



"i 3 r 




~Xl~ 


l i 2 


, x = 


X2 


.2 2 0. 




. x ^ 



and K = 



may be solved for x u x 2 , and x 3 by pre-multiplication by B, where 

-i i 



B = 







-i - 
i -iJ 



9 18 Use matrix multiplication to verify the results of Theorem 9-2 when A B 
and C are of the form ' ' 



"1 3 2" 




--1 


2 


r 




~-2 2 1 


1 4 


, B = 


3 


-2 


-l 


, and C = 


2 4 


.2 3 1_ 




. 1 


4 


2_ 




.13 1 



9-19 If A is a square matrix, then the associative property of matrices allows us to 
write A" without ambiguity because, for example, A 3 = A(A A) = (A A)A. If 

"cosh x sinh x 

^sinh x cosh * 

use the hyperbolic identities to express A* and A 3 in their simplest form and 
use induction to deduce the form of A". 



436 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 







2 5 7 9" 






; (0 


4 3 1_ 









4 


2 



19' 

2 
4 



9-20 Transpose the following matrices: 
(a) [1 4 17 3]; (b) 



(d) 



9-21 Use Definition 9-7 and Theorem 9-3 to prove that: 

(a) the sum of a square matrix and its transpose is a symmetric matrix; 

(b) the difference of a square matrix and its transpose is a skew-symmetric 
matrix. 

Illustrate each of these results by an example. 

9-22 Verify that (A B)' = B' A', given that 

"-4 2" 






3 


3 





2 


-1 



-2" 
1 


; (e) 


"4" 
3 
1 


0. 




-0- 



A = 



1 4 7 
9 -3 1 



and B = 



3 1 
-5 6 



9-23 If a matrix A contains complex numbers as elements it is said to be a complex 
matrix. Its complex conjugate is denoted by A* and is defined to be the matrix 
obtained from A by replacing each element by its complex conjugate. Show 
from this and the definitions given in the text that : 

(a) (A*)* = A; 

(b) (A±B)* = A* ±B*; 

(c) (/<A)* = //A*, where /i is any complex number and ,« is its complex 
conjugate. 

9-24 Find the complex conjugates of the matrices A and B, where 



A = 



1 2 + 'i , „ r ' J - 2i i 

and B = 
- 2/ i J 11 + i 1 + i J 



and, taking /i = 1 — i, use them to verify the results of the previous problem. 

Section 9-3 

9-25 Evaluate the determinants v 



(a) 







1 3 




1 2 










; (b) 


2 5 


; (c) 


4 7 




1 3 7 





-5 -5 



/ 9m Without expanding the determinant, prove that 



L 



1 + a\ a\ ax 

«2 1 + a2 a% 
03 03 1 + «3 



= (1 + a\ + 02 + 03). 



PROBLEMS / 437 



9-27 Use Theorem 9-4 to simplify the following determinants before expansion : 



(a) 



(0 



A| = 



A| = 



42 61 50 

3 2 

4 6 5 
2 1 5 
5 17 56 
4 1 7 



(b) |A| = 



9 
16 

2 



9-28 Without expanding prove that 

+ — _ 

x 2 + oi 2 0102 aids 

i- 

a?fX\ X 2 + a% 2 0203 

asai dllgh x 2 + as 2 

9-29 Show without expansion that 
a 2 b 2 c 2 
a b c 
1 1 1 



= x\x 2 + ai 2 + a 2 2 + A3 2 ). 



= (a - b)(a - c)(b - c). 



This determinant is called an alternant determinant. Illustrate the result by 
means of a numerical example and verify it by direct expansion. 

9-30 Prove that 

sin (x + Jit) sin x cos x 

| A | = sin {x + Jit) cos x sin x 

1 a 1 - a 

is independent of a, and express it as a function of x. 

9-31 Find the minors My and cofactors Ay of each element a i} in the matrix 



i -! y 

9-32 If A is an arbitrary matrix of order (3 x 3) with general element a« and co- 
factor At], show by direct expansion that: 

(a) aiiAm + ai2A3?. + ai3A33 = 0; 

(b) Ol3Al2 + £23/422 + 033/432 = 0. 

9-33 Use the Laplace expansion theorem to expand determinant (b) in Problem 
9-25 first in terms of elements of the third row, and then in terms of elements 
of the third column. 

9-34 If A is a matrix of order (3 x 3) and A' is its transpose, prove by direct 
expansion that 

I A I - I A' I. 



438 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



Use the Laplace expansion theorem to prove that this result is true for a 
square matrix A of any order. 

9-35 Verify by direct expansion that for any square matrices A and B of order 
(2 X 2): 

|AB| = 1A[|B|. 

This result is, in fact, valid for square matrices A and B of any order. 



Section 9-4 

9-36 Which of the following sets of vectors are linearly independent, and where 
linear dependence exists determine its form: 



(a) Ci 



(b) Ri = [1 9 -2 14], R 2 =[-2 -18 4 -28]; 



"3 _ 




" 0" 




" 0" 





, C 2 = 


-7 


, C3 = 





.0. 




0. 




.15. 



"2" 




"1" 




"1" 




"5" 


1 


, C2 = 


1 


, C3 = 


2 


, C4 = 


6 


.0. 




.2. 




.1. 




.4. 



(c) Ci = 



9-37 Test the following matrices for linear independence between their rows or 
columns: 



(a) 



1 2 


-1 0" 




" 


2 


3 


r 




"1 2 1 5 - 


2 3 
1 1 


1 1 
2 


; (b) 


-2 
-3 



1 


-1 



2 
-2 


; (c) 


2 12 
10 2 1 


1 


2 3- 




— 1 


-2 


2 


0- 




-5 3 7 7. 



9-38 Find the rank of the following matrix: 
2 1 4 3' 

A= -1 2 4-2 6 

7 -4 -12 14 -12 

9-39 Construct an example of a matrix of order (4 x 3) which is (a) of rank 2, and 
(b) of rank 3. 



Section 9-5 

9-40 Show that adj A = A when 
-_4 _3 _3" 

A= 1 1 
.443. 
9-41 Find the matrix adjoint to each of the following matrices: 



PROBLEMS / 439 



(a) 



9-42 Set 



'1 2 3' 

2 3 2 

3 3 4 



(b) 



1 2 3' 
1 3 4 
1 4 3 



(c) 



a b 

c d 



caca-ca 

and equate corresponding elements to determine the inverse of 

ca- 



9-43 Find the 


inverse of 












" 3 -2 -1" 








A = 


-4 1 -1 
.201. 








Verify that: 

(a) A" 1 A = AA- 1 = I; 

(b) (A-i)-i = A. 






9-44 Given that A and B are 








"1 2 r 




"1 


-1 


A = 


1 4 2 


and B = 





2 




_0 3 2. 




.1 





verify that (A B)- 1 = B 1 A" 1 . 






Section 9-6 













9-45 Find dA/df and determine the largest interval about the origin in which it is 
defined, given that 



AW = 


~2t 3 tanr cosfl 
.3 4-f« 1 + /J' 


•46 Given that 


A(0 = 


"cosh t sinh /") 
_sinh t coshtj 



It < 2 J 



verify results (d), (f), and (g) of Theorem 9-9. 
9-47 Show that for the matrix 

"cosf — sinr" 

_sin t cos /_ 
it is true that (d/d^A 2 = 2A(dA/dO, but that this is not true for the matrix 



A(/) = 



440 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 



A(0 



L2 /tj" 



9-48 Show that if A(f ) is a non-singular matrix, then 
At At 



Verify this result when 

fcos t — sin r~l 
A= . 

|_sin / cos tj 



Section 9-7 

9-49 Solve the following equations using Cramer's rule: 

Xl + X2+ X3= 7 

2xi — X2 + 2x3 — 8 
3xi + 2*2 — xa = 11. 

9-50 Solve the equations of the previous example using the inverse matrix method 
and compare the task with the previous method. 

9-51 Solve the following equations using Cramer's rule: 

Xl — X2 + X3 — Xi — 1 

2xi — X2 + 3X3 + X4 = 2 

Xl + JC2 + 2X3 + 2X4 = 3 

Xl + X2 + X3 + -X4 = 3. 

9-52 Write down the augmented matrix corresponding to the equations : 

2xi — X2 + 3X3 = 1 

3xi + 2x2 — X3 = 4 

xi — 4x2 + 7x3 = 3. 

Show, by reducing this matrix to its echelon equivalent, that these equations 
are inconsistent. 

9-53 Write down the augmented matrix corresponding to the equations: 

3xi + 2x2 — X3 = 4 
2xi — 5x2 + 2x3 = 1 
5xi + 16x2 — 7x3 = 10. 

Show, by reducing this matrix to its echelon equivalent, that these equations 
are consistent and solve them. 

9-54 Solve the following equations, in which a is an arbitrary constant, by reducing 
the augmented matrix to echelon form : 

Xl + <XX2 + <XX3 = 1 
OXl + X2 + 2aX3 = —4 

axi — 0x2 + 4x3 = 2. 
Consider the effect of a on the solution. 



PROBLEMS / 441 

9-55 Solve the following homogeneous equations, in which a is an arbitrary con- 
stant, by reducing the augmented matrix to echelon form: 

<xxi — xi — X3 = 

— Xl + aX2 — xs = 

— Xl — X2 + 0CX3 = 0. 

Consider the effect of a on the solution. 
9-56 Solve the following equations using Gaussian elimination : 
l-202;ci - 4-371*2 + 0-651x3 = 19-447 
-3-141xi + 2-243x2 - l-626x 3 = -13-702 
0-268x1 - 0-876x2 + l-341x 3 = 6-849. 

9-57 Discuss briefly, but do not solve, the following sets of equations: 
(a) xi + x 2 = 1 (b) xi + x 2 = 1 

2xi - x 2 = 5; 2xi - x 2 = 5 

xi — X2 = 0; 

(C) Xl + X2 = 1 (d) Xl + X2 - X3 = 

2xi - x 2 = 5 2xi - x 2 - 5x3 = 0. 

— xi — 2x2 = 0; 



Section 9-8 

9-58 Write down the characteristic equations for the following matrices: 



(a) A = 


"1 4" 

1 1 


■ 


(b) A = 


"1 2" 
2 11. 




.0 2 1_ 


9-59 Find the eigenvalues and eigenvectors of 


A = 


' 1 

-2 


-r 

0. 







9-60 Prove that the eigenvalues of a diagonal matrix of any order are given by the 
elements on the leading diagonal. What form do the eigenvectors take. 

Section 9-9 

9-61 Verify that the matrix A in Eqn (9-46) is orthogonal, and justify the assertion 
that X = A X describes the effect of a general rotation of the rectangular 
cartesian axes 0{xi, X2, X3}. 

9-62 Justify the name reflection transformation of the plane when applied to a 
transformation of the form 

X = AX, 

where either 



A = 



-1 01 
.0 -lj 



or A = 



-1 01 
lj 



442 / LINEAR TRANSFORMATIONS AND MATRICES 



CH 9 



9-63 Show that if 




X = AX, 




Where 




A = 


"cos 6 
_sin 6 


—sin 6' 
cos 9_ 


then 




XX = 


■- XX, 





where the prime signifies the transpose operation. Interpret this result geo* 
metrically. 



9-64 If 



X = 



~x~ 


, x = 


~x~ 


.y. 




J- 



„ , and A = 



"_1_ _ J_ 

V2 V2 

1 _1_ 

-V2 V2J 

deduce the image of the curve y = x 2 under the transformation 

X = AX. 
Is the shape of the curve changed ? 

9-65 If 

X = 

deduce the image of the curve y — x 2 '+ 2x + 1 under the transformation 

X = AX. 
Describe the effect of the transformation in geometrical terms. 



x~ 


, x = 


~x~ 


, and A = 


"-3 0' 


J- 




J- 




•0 3. 



Section 9- 10 

9-66 How many dimensionless groups of variables (w variables) characterize a 
physical situation described by: 

(a) the four physical quantities: work (L 2 MT~ 2 ), viscosity (L^MT' 1 ), 
pressure (,L~ l MT~ 2 ) and mass transfer rate (Mr 1 ); 

(b) the five physical quantities: length (L), viscosity (L^MT' 1 ), velocity 
(LT- 1 ), area (Z. 2 ) and pressure (L^MT' 2 ). 

9-67 Express in matrix form the relationship between the differentials dx, dy and 
d«, dv, given that 

u = sinh (x 3 + y 3 ), v = cosh (x 3 — y 3 ). 

For what values of x and y does this transformation fail to have an inverse? 



PROBLEMS / 443 

9-68 Given that 

u = x 2 + ly + 1, v = X s - Ixy + y 3 

and 

p = sin (u + v), q — cos (« — v), 

display in matrix form the relationship between the differentials dx, dy and 
d«, dv and between d«, dy and dp, dq. Use matrix multiplication to express 
directly the relationship between the differentials dx, dy and dp, dq. 

9-69 Justify the matrix equations (9-55) and (9-56). 

9-70 Verify that the square matrix in (9-55) is an orthogonal matrix. 

9-71 Perform the calculations required in (9-59) to give the transformed stress 
tensor components (9-60). 



Functions of a complex 
variable 



10-1 Sequences of complex numbers and limits 

When considering a definition of a sequence {z n } of complex numbers, we 
should first examine to what extent the work of Chapter 3 on sequences of 
real numbers is still relevant to complex sequences. 

It will obviously be necessary to formulate new definitions, and this will 
be our next task. However, since a sequence {u„} of real numbers is just a 
special case of a sequence {z„} of complex numbers, any new definitions must 
be compatible with the corresponding situations in Chapter 3 when related to 
real sequences. Therefore, the behaviour of sequences of complex numbers 
will be directly determined by the behaviour of the sequences of real numbers 
that may be formed by considering separately the real and imaginary parts 
of {z n }. Thus if z n = [1 + (l/«)] + /(l/« 2 ) we would need to consider the 
two real sequences {1 + (l/«)} and {l/« 2 } associated with {z n }. 

Here we must note that expressions such as 'monotonic', 'finitely oscil- 
lating', and 'bounded above' cannot be applied to sequences of complex 
numbers as they cannot be ordered like the real numbers. 

definition 10-1 (limit of complex sequence) The infinite sequence 
{z„} of complex numbers z n = x n + iy n will be said to converge or tend to the 
limit y = /j, + iv if, and only if, for every e > there exists a number TV, 
such that for n> N, 

| y — z n I < e. 

When the sequence {z n } is convergent to y in this sense we shall write 

lim z n — y. 



This definition is easily seen to reduce to Definition 3-3 when applied to a 
sequence of real numbers, for then the complex modulus and the absolute 
value become identical in meaning. 

The essential difference between Definitions 3-3 and 101 is embodied in 
the following theorem. 

theorem 10-1 (conditions for convergence) Let {z n } be an infinite sequence 



SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 445 

of complex numbers z n = x n + iy n - Then necessary and sufficient conditions 
for 

lim z n = y, 

n— *°o 

where y = ju + iv, are that 

lim x n = n and lim y n = v. 

Proof A paraphrase of this theorem would be that if {z„} converges to y, 
then the sequence of the real parts of {z„} converges to the real part of y and 
the sequence of the imaginary parts of {z n } converges to the imaginary part 
ofy. To establish the necessity of the conditions of the theorem suppose that 
for some positive number e, \ y — z n | < e for n > N. Then 

I V ~ z n | = | H + iv - (x„ + iy n ) \ = | (fi - x„) + i(v - y n ) \, 

amd so by the definition of the modulus of a complex number, 

(y - -n) - [(// - xn) 2 + (v- y n y}v*. 

Neglecting first the positive term (/i - x n ) 2 , and then the positive term 
— }'n) 2 , shows that 

I y — z n | > | fi - x n | and \y - z n \ >\ v - y n \. 

Hence | p - x n | < e and | v - y n \ < e for n > N showing, by virtue of 
Definition 3-3, that 

lim x n = fi and lim y n = v. 

n-»oo n— cc 

The sufficiency of these conditions is almost immediate. If 
lim x n = /J, and lim y n = v, 

then for any positive e choose N such that \ p — x n \ < e and | v — y„ \ < e 
for n> N. Then, as 

\y- Zn \ = [(fl- X„) 2 + (v- J„)2]l/2, 

it follows that 

I y - z n | < V(2« 2 ) = £^2. 

This establishes our result because e was arbitrary and so \y — z n \ can 
always be made arbitrarily small by a suitable choice of e. 

The fact that a sequence of real numbers can only have one limit implies 
the uniqueness of p and v, and hence the uniqueness of y. Consequently we 
have arrived at the following result. 

Corollary 10-1 If the sequence {z„} of complex numbers is convergent, then 



446 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

it only has one limit. 

Example 10-1 Examine each of the following sequences {z n } of complex 
numbers for convergence and, where appropriate, determine the limit. 

(a) z„ = I 1 + -I + *' sin y 

2 
(c) z n = n sin - + iV n W( n + 6) — \/n]. 
n 

Solutions We shall obtain our results by means of a direct application of 
Theorem 10-1. 

(a) Making the identifications 

1 mr 

x n = 1 + - y n = sin — 

n 2 

we see that limx» = 1, whereas the sequence {y n } has no limit since y n 

n— *-oo 

assumes successively only the three values 1, 0, and — 1. Hence the sequence 
{z n } does not converge and so has no limit. 

(b) Making the identifications 

2n + 1 /n-\Y 



-W 



x n - -3^-, y 

we see that 

lim x n = f and lim y n = 1 . 

n—*oo n~*<x> 

Hence the sequence {z„} converges and 
lim z n = f + /. 

n— *-oo 

As the numbers f and 1 are not members of their defining sequences {x n } 
and {y n }, the complex limit y is not included as a member of the sequence 

(c) Make the identifications 

. 2 
x n = n sin -, j„ = \/nW(n + 6) — \/n]. 

Then 

lim ;r„ = 2 



SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 447 



and 

lim y„ = lim \/n . \/n 



= lim« 



1 + 






= 3. 
Thus the sequence {z n } converges and 
lim z n = 2 + 3i. 

n-*oo 

For the same reason as in (b) above, the limit 2 + 3/ is not a member of the 
sequence {z n }. 

Arguments essentially similar to those given in Theorem 101 establish 
results from complex sequences that are strictly analogous to those of 
Theorem 3-1. We state them below without proof. 

theorem 10-2 If it can be shown that {w„} and {z„} are two convergent 
sequences of complex numbers with lim w n = X and lim z n = y, then 

n— f oo n— *■ oo 

(a) wi + z\, wz + Z2, wz + Z3, . . . is a sequence such that 
lim(w„ + z„) = X + y; 

n—*oo 

(b) w\Z\, W2Z2, W3Z3, ... is a sequence such that 
lim w n z n = Xy ; 

n— *oo 

(c) provided y =£ 0, w\\z\, wzjzz, W3/Z3, ... is a sequence such that 
Km (5) - X,r. 

Example 10-2 If w„ = [«(1 + /)/(« + 1)] and z n = (1/n) + [(n 2 + 1)/ 
(2« 2 + 3)]/, find (a) lim (w n + z„); (b) lim (w n z n ); and (b) lim (w n jz n ). 

n-*oo n-*oo n—>oo 

Solution By inspection we have 

lim w„ = 1 + i and lim z» = \i. 

n— >co n~+oo 

Hence by Theorem 10-2, 

(a) lim (w n + z») = (1 + i) + Ji = 1 + ^; 

n— *-co Z 

(b) lim (w n z n ) = (1 + 0J/ = 10' - 1); 



448 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 



(c)lta (S)_l+.'_ 2 _ 2l 

n— oo \ z n I f« 



If the terms of a sequence are plotted as points in the complex plane, 
Theorem 10-1 and its Corollary imply that when that sequence is convergent, 
it will have the property that with increasing n its points will cluster ever 
closer to the one point that represents its limit y. In other cases it may happen 
that although a sequence is not convergent, nevertheless when its terms are 
plotted in the complex plane they cluster around two or more distinct points. 
By analogy with sequences of real numbers, these points will be called limit 
points of the complex sequence, and for their definition they require the 
notion of a neighbourhood of a point. 

Accordingly, we shall use the term a neighbourhood of the point £ in the 
complex plane to mean the interior of any circle centred on £. This idea 
enables us to define a limit point. 

definition 10-2 (limit point) The point £ will be called a limit point of 
the sequence {z n } of complex numbers if every neighbourhood of £ contains 
at least one point of {z„} other than the point £ itself. 

It is an immediate consequence of this definition that every neighbourhood 
of a limit point £ of {z„} contains an infinite number of points of {z„}. We 
again emphasize that Theorem 10-1 together with its Corollary imply that a 
convergent sequence {z n } of complex numbers can have only one limit point. 

Example 10-3 Identify the limit points of the sequence {z n } where 
z ._ ( 2 _ I) +/(_!). (l + l 8in ^). 

Solution Make the identifications 

1 / 1 . mr\ 

x„ = 2-- and y„ = (~\) n 1 + - sin— • 
n \ n 2 1 

Then {.\ n } converges to the limit 2 and thus has one limit point, whilst {y n } 
does not converge but has the two limit points 1 and — 1. Hence the sequence 
{z„} has the two limit points 2 + i and 2 — /. 

10-2 Curves and regions 

The notions of a curve and a region in the real plane may be immediately 
extended to the complex plane. As a closed and not necessarily smooth curve 
is a connected set of points which serves to de-limit two areas of the plane, 
which we shall call the interior and exterior regions relative to that curve, we 



SEC 10-2 



CURVES AND REGIONS / 449 



ought first to define a curve C in the complex plane. It is frequently convenient 
to give a parametric representation by expressing C as the set of points 

z = x(s) + iy(s) fora<s<b, (10-1) 

where x(s) and y(s) are continuous real functions of the parameter s. It 
should be apparent from Section 2-5 and subsequent work that the require- 
ment of continuity for the real functions x(s), y(s) will ensure that C is a 
continuous curve (that is, unbroken), but that it does not necessarily possess 
a tangent at every point. As a simple illustration C might be a rectangle, for 
then tangents would not be defined at the corners though the curve would be 
continuous everywhere. We shall return to these general matters later when 
a continuous function of a complex variable has been defined. For 
conciseness let us henceforth call such curves C, continuous curves. 

For a less trivial example, suppose that the curve C in the complex plane 
is defined by z = x(s) + iy(s), where 

x(s) = sin s for — \n < s < — . 



[sin 2 s for— in < s < I 



y(s) = 



for \n < s < 



3tt 




Fig. 10-1 Continuous curve C having no tangent defined at P and Q. 

Then it is readily seen that C is the continuous closed curve comprising the 
parabola y = ** in the interval - 1 < x < 1, together with the points of the 
line y = 1 common to that same interval. The curve C is shown in Fig. 10-1 
and it is continuous everywhere, though it is not smooth everywhere for no 
tangent can be defined at points P and Q. The darkly shaded area in that 



450 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

Figure comprises points which are interior relative to C and form the interior 
region, whilst the lightly shaded area comprises points which are exterior 
relative to C and form the exterior region. When speaking in terms of regions, 
the points comprising the curve C itself are usually called the boundary 
points and they may, or may not, belong to a region. 

A parametric representation of a curve C is not always the most convenient 
method for its description in the complex plane and, on occasions, it is better 
to identify the points z comprising a curve directly in terms of z itself. When 
necessary, regions are usually defined in the complex plane by means of a 
combination of curves and inequalities, as was done in the real plane. 

Example 10-4 Describe the curve C defined by the equation 

U-2|=| 
and use the result to define the region exterior to C. 

Solution This expression defines a connected set of points that all have a 
modulus 3/2 relative to the point z = 2 as origin, that is to say, the set of 
points which are all distant 3/2 from the point z = 2. Hence the equation 
| z — 2 | =3/2 describes a circle C of radius 3/2 centred on the point z = 2. 
Algebraically, the same result is obtained by writing z = x + iy, when 
| z — 2 | = | (x — 2) + iy |, so that from the definition of the modulus of a 
complex number, | z — 2 | = 3/2 is seen to be equivalent to the algebraic 
equation (x — 2) 2 + y 2 = 9/4. This is a circle of radius 3/2 centred on the 
point (2, 0). The region exterior to C is the entire complex plane less the 
points lying in and on this circle. 

Example 10-5 Describe the region interior to and including the curve C 
defined by 

arg (z - 1) - arg (z - /) = \n, 

and also satisfying the inequalities 

i < Re z < f and Im z > 0. 

Solution Consider the construction in Fig. 10-2 (a) in which P is the point 
z = 1, Q is the point z = i and R is a general point z. 

Simple geometrical arguments then establish that the angle y is related 
to the angles a and /? by the equation 

y = 77 + a — /S. 

However, the line PR is the vector z — 1, whilst the line QR is the vector 
z — /, so that arg (z — ;') = a and arg (z — 1) = /?. Since by the conditions 
of the problem we must have /3 — ot = \-n, it follows that y = \tt. The angle 
QRP is thus a right angle and hence the curve C must be a semi-circle drawn 



SEC 10-2 



CURVES AND REGIONS / 451 



A 


k 

y Complex plane 




f) 


<^f\<l 




M 


/HA 







p 


x> 



o 



Complex plane 



\ 
I 

H 



(a) 



i I I 



(b) 



Fig. 10-2 Region in complex plane: (a) boundary curve; (b) region interior to C 
and satisfying stated inequalities. 

from P to Q with PQ as its diameter. The semi-circle must lie above the 
diameter PQ, since were.the general point R to be taken below that line the 
equation relating the arguments would no longer be satisfied. To define 
the lower semi-circle the following condition would be needed : 

arg - 1) - arg (z - = -\n. 

To complete the solution to the problem it is now necessary to interpret 
the inequalities. The inequality \ < Re z < f describes the narrow strip 
bounded by the lines x = J and x = f , with the points of the line x = J 
excluded from consideration. The inequality Im z > is the half plane above 
and including the x-axis itself. Figure 10-2 (b) presents a composite diagram 
with the shaded area representing the region satisfying all the conditions of 
the problem. Boundary points belonging to the region are indicated by a 
heavy line and those excluded by a dotted line. 

Notice from this and the previous example that there is more than one 
way of specifying a given curve and region. The condition 

arg (z - 1) - arg (z - i) = \-n 
is an alternative expression of the condition 

\z-\-¥\ = ^Y 

with Re z > 0, Im z > 0, 

which, in turn, is an alternative expression of the algebraic condition 

(x - W + (y-h) 2 = h 

with x > 0, y > 0. 



452 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

10-3 Function of a complex variable, limits, and 
continuity 

In Chapter 2 we used the term 'a real valued function of a real variable' to 
mean any rule that associates with each real number from the domain of 
definition of the function a unique real number from the range of that func- 
tion. Symbolically, if D denotes the set of points in the domain of a function 
/, and R denotes the set of points in the range of/, this relationship or mapping 
is given by 

R = /(/>). 

These ideas still hold good when the domain D and the range R include 
complex numbers. Thus if z is any point in D, and w is the unique number 
assigned to z by the function/, we write 

w=f(z). (10-2) 

The number z = x + iy is allowed to assume any value in D and so, if 
desired, could be called a complex independent variable, when w could then 
properly be called a complex dependent variable. Usually we shall simply 
refer to z and w as complex variables. It must be appreciated that, like z, the 
variable w has a real part and an imaginary part, both of which are in general 
dependent on x and y through the variable z = x + iy. We summarize these 
ideas formally as follows. 

definition 10-3 (function of a complex variable) We shall say that/is a 
function of the complex variable z = x + iy, and write 

w =/(z), 

if / associates a unique complex number w = u + iv with each complex 
number z belonging to some region D of the complex plane. 

Specific examples of functions of a complex variable are : 

(a) w = iz + 1 ; (b) w = zz; (c) w = z 2 + 2z + 1 ; (d) w = l/(z - 2); 
(e) w = sin z. 

With the exception of (d), which is not defined for z = 2, these functions are 
defined for all z. 

The difference between a function of a complex variable and a real valued 
function of a real variable is made clear by expressing these examples in real 
and imaginary form. Thus writing z = x + iy and w = u + iv we find: 

(a) w = i(x + iy) + 1 = (1 — y) + ix, showing that u = 1 — y, v = x; 

(b) w = (x + iy)(x — iy) = x 2 + y 2 , showing that u = x 2 + y 2 , v = 0. 
This is an example of a function that always maps a complex variable 
into a real variable. 



SEC 10-3 LIMITS, CONTINUITY / 453 

(c) w = (x + iy) 2 + 2(x + iy) + 1 = (x 2 + 2x - y 2 + 1) + i(2y + 
2xy), showing that u = x 2 + 2x - y 2 + 1, v = 2y(l + x); 

(d) w = l/(x + *> - 2) = [(x - 2) - »>]/(x 2 + J 2 - 4x + 4), showing 
that u = (x - 2)/(x 2 +yz-4 x + 4),v= -y/(x 2 + y 2 - Ax + 4), pro- 
vided only that x =£ 2 and y ^ 0; 

(e) w = sin z = sin (x + /j) = sin x cos iy + cos x sin i)>, and so using 
the results of Problem 6-33, that cos iy = cosh y, sin iy — i sinh y, 
we arrive at w = sin x cosh y + i cos x sinh y. Thus in this case 
« = sin x cosh y, v = cos x sinh y. 

Any function of x, y and complex constants that gives rise to a unique 
complex number when x and y are specified defines a function of the complex 
variable z by virtue of the relationship z = x + iy. For suppose that 

(x+y+l) + i(x-2y)=f(z), 

then to determine /(z) when z = 1 + 2i we simply write x + iy = I + 2i, 
showing that x = 1, y = 2, after which it follows from the form of /(z) 
that/(l + 2/) = 4 - 3/. 

Our Definition 10-1 of a limit of a sequence of complex numbers extends 
without difficulty to include the concept of a limit of a function of a complex 
variable. In essence, we shall say that/(z) has the limit w as z -> Zo and will 
write 

lim/(z) = h'o (10-3) 

z—zo 

when, for any small e > 0, we can always ensure that |/(z) — w \ < e by 
confining z to some suitably small circular neighbourhood \ z — z \ < d 
of the point z . That is to say/(z) can be made arbitrarily close to w by 
taking z sufficiently close to z , irrespective of the manner of approach of 
z to z . As in the real variable case, we do not require that/(z) be defined at 
zo or, if it is, that/(z ) should equal w . Expressed formally this becomes: 

definition 10-4 (limit of a function of a complex variable) The function 
/(z) will be said to tend to the limit w as z ->■ z , and we shall write 

lim/(z) = wo, 

if, and only if, for any e > there exists a d > such that 

|/(z) - w | < e when | z - z | < 6 with z ^ z . 

This form of statement should be compared with that in Definition 3-8 
relating to a real valued function of two real variables. There is no essential 
difference, since the complex modulus is equal to the distance function p 
used in that definition. 



454 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 



Example 106 Prove that 
lim (z 2 + 1) = 1 + 2/. 

z-»l + i 

Solution The result is self-evident since the function /(z) = z 2 + 1 is 
uniquely denned for all z, but let us prove it using Definition 10-4. 

|/(z) - (1 + 2i) | = | z 2 - 2/ 1 = | [z - (1 + OH* + (1 + 0] I 

and from the properties of the modulus this becomes 



|/(z)-(l +2i)| = |z- 1 - 

= |z-l- 

<|z-l- 



. I z + 1 + I I 

. I z - 1 - i + 2(1 + | 

{ | z - 1 - / I + 2 | 1 + / 1 }. 



Hence we may make |/(z) — (1 + 2/) | < e, where e > is arbitrarily 
small, provided that we choose the number 6 > such that | z — 1 — / 1 < d 
and d{d + 2 | 1 + / 1 } < e. The conditions of Definition 10-4 are satisfied, 
thereby establishing that 1 + 2/ is the limit. In other words, as z approaches 
the value 1 + /, so the function f(z) = z 2 + 1 approaches the number 
1 + 2/, which is its limit. In this case it also happens to be true that lim/(z) 
= /(zo). 



Z-»Z 



Example 10-7 Prove that 

z^2i \Z — 2j/ 



Solution Unlike the previous situation, the function /(z) = (z 2 + 4)/(z — 20 
is not defined when z = 2/'. To establish the desired result we notice that 



|/(z)-4/| = 



z 2 + 4 _ 4/ z 


- 8 




z-2/ 
z 2 — 4/z — 4 


= 


(z - 2/) 2 
z-2/ 


z-2/ 



= I z - 2/ 1 



Thus we can ensure that | /(z) — 4/ 1 < e by taking | z — 2/ 1 < d, where 
here d = e. The conditions of Definition 104 are satisfied, and thus we have 
established that 






despite the fact that the function f{z) = (z 2 + 4)/(z — 2/) is not defined at 
z = 2/. 

The results of Theorem 10-2 generalize to give limit theorems for functions 
of a complex variable. 



SEC 10 ' 3 LIMITS, CONTINUITY / 455 

theorem 10-3 (operations on limits of complex functions) If /(z) and 
g(z) are two complex functions for which 

lim/(z) = v and lim^(z) = u' , 

z—zq z—zq 

then 

(a) lim [f(z) + g(z)] = v + „■„; 

Z-Z 

(b) lim/(z) 5 -(z) = y H ; 

z—zo 

(c) provided w ^ 0, lim [f(z)]/[g(z)] = v /w . 

z—z 

The proofs of these results follow directly from Definition 10-4 and are 
left to the reader. 

Example 10-8 Apply the results of Theorem 10-3 to the functions /(z) = 
z 2 + 2z + 1 and g(z) = 1 - iz to determine the limits of /(z) + g(z), 
f{z) g{z), and f(z)/g(z) as z — /. 

Solution The functions/(z) and g(z) are defined for all z and so it is easily 
seen that 

lim/(z) = lim (z 2 + 2z + 1) = 2i 

z-*i z—i 

and 

lim g(z) = lim (1 - iz) = 2. 

z—i z—i 

These results, which have been obtained by direct substitution, may be verified 
by using Definition 10-4, as in Example 10-3. Results (a), (b), and (c) of 
Theorem 10-3 may thus be applied to yield: 

(a) lim [f(z)+g(z)] = 2(1+/); 

z—i 

(b)limf(z)g(z) = 4i; 

(c) as lim g(z) = 2^0, lim ^ = /. 

z-i z^i g(z) 

It is now a simple step to extend the idea of continuity for, as with real 
valued functions of a real variable (c.f. Definition 3-9), we shall say that the 
function /(r) is continuous at z if lim/(z) = w exists and/(z ) = h . We 

z—z 

thus arrive at the following statement. 



-*0 



definition 10-5 (continuity of a function of a complex variable) The 
complex function /(z) will be said to be continuous at z if: 



456 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

(a) Iim/(z) = it'o exists, 

and 

(b) /(z ) = no. 

A complex function will be said to be continuous in a region of the complex 
plane if it is continuous at all points of that region. 

Example 10-9 Prove that the function /(z) = a + bz is continuous 
everywhere. 

Solution If zo is any complex number we have 

|/(z) -/(zo) I = I a + bz - a - bz | == ] b || z - z |, 

so that for any e > 0, 

|/(z)-/(z )| <e if \z-z \<d 

provided that we take 8 = e/\ b \. We have proved that 
lim/(z) = a + bzo, 

Z-'ZO 

which is condition (a) of Definition 10-5. Condition (b) is obviously true as 
f(zo) = a + bzo for all z . As zo was arbitrary, it follows that we have 
proved the required property of continuity for/(z). Notice that by first setting 
b = and then setting a = 0, b = 1, the continuity of the functions /(z) = a 
(constant) and/(z) = z follow as special cases. 

Example 10-10 Prove that the function f n (z) = z n , where n is a positive 
integer, is continuous for all z. 

Solution The proof is by induction. In the previous example we proved as a 
special case the continuity of/i(z) = z. If we assume that/ m (z) is continuous, 
then since f m +i(z) = z m+1 = z . z m =/i(z) .f m (z), it follows directly from 
Theorem 10-3 (b) that/ m+ i(z) is continuous. Thus if P{m) is the property that 
f m {z) is continuous, we have proved directly that P{\) is true and also that if 
P(m) is true, then so also is P(m + 1). Hence it follows by induction that 
P(m) is true for all m, which establishes our result. 

Further use of Definition 10-5 coupled with Theorem 10-3 makes it a 
straightforward matter to establish many other important and useful results 
concerning continuity. Typical of results that follow from such reasoning 
are that a complex polynomial 

P(z) = a + aiz + a 2 z 2 + • • • + a„z n 

is continuous everywhere, whilst a complex rational function 



SE C 10-3 LIMITS, CONTINUITY / 457 

a + a\z + a 2 z 2 + • ■ • + cimz™ 



m.= 



bo + biz + b 2 z 2 + • • • + b n z n 



is continuous everywhere except at the n zeros of the denominator. 

It is interesting to give an alternative proof of the continuous nature of a 
polynomial P(z). As z = x + iy, it follows that we may express P(z) in the 
form 

P(z)= Qi(x,y) + iQ 2 (x,y), 

where Qi(x, y) and Q 2 (x, y) are real polynomial functions each with general 
terms of the form x s y l in which s, t are either zero or positive integers. Now 
from the behaviour of real functions of two real variables we know that 
Qi(x, y) and Q 2 (x, y) must be continuous functions of x and y everywhere 
in the plane. 

However, if n and z 2 are any two points with zi = xi + iy x and z 2 = 
xi + iy 2 , then 

| P(z 2 ) - P(zi) | = | Qi(x 2 ,y 2 ) - 2i(.xi, ji) + i[Q 2 (x 2 ,y t ) - Q 2 (x u yi)] | 
< I Qi(x 2 ,y 2 ) - Qi(x uyi ) | + | Q 2 ( X2 , y 2 ) - Q 2 ( Xl , yi ) |. 

Now as Qi(x,y) and Q 2 (x, y) are continuous, it is true that 

lim Qi(x 2 , J2) = Qi(x h ji) and lim Q 2 (x 2 , v 2 ) = Q 2 (x u y{), 
vi-*vi v2—yi 

and so | P(z 2 ) — P(z{) \ may be made arbitrarily small by taking z 2 sufficiently 
close to z\. This proves our assertion of the continuity of P(z) for all z, since 
zi, z 2 were arbitrary points in the complex plane. 

Obvious extensions of the other continuity theorems proved for real 
variables are also possible and the most useful ones are summarized below 
without further proof. 

theorem 10-4 (continuity theorem for complex functions) If/(z) and 
g(z) are two complex functions each continuous at z =z , then 

(a) f(z) + g(z) is continuous at z ; 

(b) /(z) g(z) is continuous at z ; 

(c) f(z)jg(z) is continuous at z provided g(z ) ^ 0; 

(d) if/(w) is continuous at w = n , and vv = g{z) is continuous at z = z , 
with H'o = g(z ), then the composite function (function of a function) 

f[g ( z )] is continuous at z = zq. 

It is, for example, condition (d) of this theorem that validates the assertion 
that (z 2 + 3z + 2) 3 is continuous everywhere. (Why ?) 



458 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

10-4 Derivatives — Cauchy-Riemann equations 

Thus far the related concepts of a limit, a function, and of continuity 
have been successfully extended to include a function of a complex variable. 
It is now reasonable to attempt to generalize the notion of a derivative, and 
at this point we encounter a major dissimilarity between a function of a 
complex variable and a real valued function of two real variables. Indeed, 
whereas we have already seen that most real valued functions of two real 
variables are partially differentiable with respect to those variables, it will 
shortly be shown that the operation of differentiation can only be defined 
for a very special class of complex functions. Before discovering the exact 
nature of the restriction on a complex function if it is to be differentiable, 
we must extend our definition of a derivative in a manner compatible with 
the real variable case. 

definition 10-6 (derivative of a complex function) Let w =f(z) be 
defined in some neighbourhood of the point z = zo and let | h | be sufficiently 
small for z = zo + h to lie within this neighbourhood. Then, if the difference 
quotient 

f( Zo + h) - f(z ) 



tends to the limit y as | h | ->■ 0, we shall call y the derivative of/(z) at zo and 
will write either 



/'(zo) = y or 



dw 
dz 



= y- 

Z = 20 



If this difference quotient has a limit for all points zo of some region in which 
h' = f(z) is defined, then /(z) will be said to be differentiable in that region. 
The derivative, as a function of a general point z, will be denoted either by 
f'(z) or dw/dz. 

Alternatively expressed, this definition asserts that the complex number 
y is the derivative of w = /(z) at z = zo if, for every e > 0, there exists a 6 
such that 



/(ZQ + h) -/(zo) 

y 



<e for I h I < <5. 



Notice that although h is small when | h \ is small, the condition | h | — >- 
that is imposed in our definition of a derivative requires the limit defining 
the derivative to exist for all possible methods of approach of h towards zero. 
This means that if the derivative is to exist, then it must be independent of 
the manner in which h-*0. This is a vitally important feature of the definition 
and one to which we shall return. 



SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 459 

Example 10-11 Prove that if n> is an integer and w = z", then 

chv 
d7 

for all z. 



= nz n ~ x 



Solution Consider some point zo and form the difference quotient 

(z + h) n - z n 

, — _, 

h 
where It is any complex number. Then by the binomial theorem 

(ZQ + h) n - Zq" 

h 

zo" + ftfoo"- 1 + [n(n - l)]/2! ft 2 z "- 2 + • • • + h n - z » 

and thus 

(zo + h)« - zo" , n(« - 1) 

Now as | /? | ->■ implies A -»• 0, taking the limit of this expression as | h \ -> 
we arrive at the derivative of the function w = z« at the point z : 



lim 

1*1-0 



"(zo + h) n - zo"' 



nzo"- 1 . 



Since the point z was arbitrary this result is true for all z , and so the 
function is differentiable for all z and 

d(z») _ , 

— — = nz n ~ x . 
dz 

A more subtle argument shows that this result is, in fact, true for any value 
of n and not just for n a positive integer. 

Example 10*12 Prove that if w = sin z, then 

dvv 

— = cos z 

dz 

for all z. 

Solution Let z be any value of z and form the difference quotient 
sin (zo + A) — sin zo 



460 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 



where h is any complex number. Then using a familiar trigonometric identity 
we have 



sin (zo + ft) — sin zo sin zo cos h + cos zo sin h — sin zo 



^sin/A /l 

= cos zo | —7- I — sin zo I — 



/l — cos h\ 

zo \—r-y 



Now u' =/(z) will be differentiable if the limit of the right-hand side of this 
expression can be shown to exist. This is most easily done by utilizing the 
formal power series expansions for the sine and cosine functions, which show 
that 



and 



sin h 1 
~h ' = h 


h- 


h» h 5 
3i + 5! + - 




= 1 


A 2 /t 4 

_ 3~! + 5! 


— 




1 

1 — cos h 
h 


1 
~h 


r A 2 

. 1 - 1+ 2! 


/? 4 

-4! + - 


-1 


= h 


.2! 


h* 1 



It is clear from these that because | h \ — >■ implies h —*■ 0, then 

/sin h\ 1 1 — cos h\ 

hm —— =1, hm -— 

ui-*0 \ ft I I /1 l-*o \ n 



= 0. 



Returning to our problem, taking the limit of the difference quotient as 
I h I — *■ and using the above limits gives for the derivative of if = sin z at 
the point zo the result 



lim ( 

I ft M) \ 



sin (zo + h) — sin zo 



= COS Zq. 



Once again, as zo was arbitrary, we have shown that w = sin z is differ- 
entiable for all z, so we may write 



dz 



(sin z) = cos z. 



Alternative derivations of the two limits involved in this example are indicated 
in Problems 10-22 and 10-23. 

The following theorem is an obvious extension of Theorems 5-4 to 5-8 
relating to the real variable case. 

theorem 10-5 (rules of differentiation) If/, g are differentiable functions 
in some region, then throughout that region : 

(a) -*- [/(z) + g(z)] = $f + -? (Derivative of Sum); 
dz dz dz 



SEC 10-4 



DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 461 



d de df 

(b) dz [ ^ Z) ^ (Z)] = f® df + ?(z) dl ( Derivative of Product) ; 



(c) 



dz 



U(z). 



g(z)(d//dz) -/(z)(dg/dz) 
^-j provided g(z) ^ 

(Derivative of Quotient) ; 
d 

( d ) falfigi 2 )]} = f'[g(z)]g'(z) or, by writing u = g(z), this takes the 

form - {f[g(z)]} = ~ ^ (Chain Rule); 

(e) /(z) and g(z) are continuous functions of z (Differentiability implies 
Continuity). 

Proof All these results may be established directly from the definition of a 
derivative by arguments that are essentially similar to the real variable case. 
We give the proofs of (a) and (e) as illustrations. 
Result (a) follows because 



az |/i|-*0 



= lim [■ 

|A|-0 L 



7(z + A) + g(z + h) -/(z) - g(zy 



h) -f{z) 



+ lim 

UI-0 



g(z + h)- g (zy 



dz dz 



Result (e) follows because differentiability of a function /(z) requires the 
difference quotient [/(z + h) — f{z)]jh to have a limit as | h \ ->■ 0, which in 
turn requires that \f(z + h) — f(z) | ->- as | h \ ->■ 0. This is just the formal 
statement that/(z) is continuous and so our assertion is proved. 

Example 1013 Use the derivatives established so far together with Theorem 
10-5 to differentiate the functions : 

(a) w = z 2 + 3 sinz; 

(b) w = z 3 sin z; 

(c) w = 1/(1 + z); 

(d) w = sin (z 2 + z + 3). 



Solution 

(a) Using Theorem 10-5 (a) with/(z) = z 2 , g(z) = 3 sin z we obtain 

dw 

— - = 2z + 3 cos z 

dz 

for all z. 



462 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 



(b) Using Theorem 10-5 (b) with/(z) = z 3 , g(z) = sin z we obtain 

— - = z A cos z + 5z l sin z 
dz 

for all z. 

(c) Using Theorem 10-5 (c) with/(z) = 1, g(z) = (1 + z) we obtain 
Aw —1 

dz = (1 + z) 2 
forz ^ 1. 

(d) Writing w = sin w, where u = z 2 + z + 3 enables us to apply the 
chain rule (Theorem 10-5 (d)): 



d 
— - (sin u) 

QU 



du 

— = (cos u)(2z + 1), 

dz 



dw 
dz"' 

whence 

— =(2z+ 1) cos (z 2 + z + 3). 
dz 

Let us now explore more carefully the implications of the requirements of 
differentiability. This is perhaps best prefaced by an illustration of a simple 
function of a complex variable that is not differentiable. 

We shall attempt to compute the derivative at zo = of the function 
/(z) = z, where z = x + iy. We have f(z) = x — iy, from which it follows 
that /(0) = 0, so that in computing the required derivative we are led to 
consider the behaviour of the difference quotient 

/(0 + h) -/(0) = f(h) -Q = h 
h h h 

as | h | — >• 0. Writing h = a + ifl this becomes 
h a - //9 a 2 - /5 2 



h a + //? a 2 + ft* 



_ ,/ 2*/? \ 

' U 2 + ^f 



Obviously this expression can have no limit as | h \ —*■ because the result is 
dependent on the manner of approach of h to zero. To see this we need take 
only two special cases: 

(a) if a = 0, and j8 -»- 0, then 



/3->0 



SEC 10-4 



DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 463 



whereas (b) if /? = 0, a ■ 



■ 0, then 



lim 

a-0 \h 
fi = 



= 1. 



The limit thus depends on the manner in which h ->■ so that no derivative 
exists in the sense of Definition 10-6. 

Obviously some conditions must be devised such that it is possible to 
decide, without appeal to Definition 10-6, whether or not a given function 
f{z) has a unique derivative— that is, whether or not the limit of the difference 
quotient in Definition 10-6 is independent of the manner in which h -*■ 0. 

Consider a function /(z), assumed to be differentiate in some region, 
and express it in the form 

/(z) = u + iv, ( 10 -4) 

where u, v are functions of x and / by virtue of the relationship z = x + iy. 
(Cf. the illustrative examples (a) to (e) following Definition 10-3.) Let us now 
compute the derivative of/(z) and, in doing so, appeal to Fig. 10-3. 



Complex plane 




Fig. 10-3 Derivative of a complex function. 



As/(z) is assumed to be differentiable, we shall choose an arbitrary h as 
shown in the Figure and allow it to tend to zero along the line QP inclined 
at an angle a to the x-axis. Then if h = X + ifi, it follows that z + h = 
x + X + i(y + (i), and so if we also make use of the alternative representation 
of h in the form h = | h \ e ia , where | h \ = (A 2 + /a 2 ) 1 ' 2 , we have 

f/(z + h) -f{z)~\ ,._ V f[(x + X) + i(y + ,£)]- f{ x + ,»"] 

i L 



f\z) = lim 

I h |-0 



= lim 

l*l-o 



or 



f'(z) = e 



"lim 

|*|-0 



(X 2 + /*2)i/2 e ia 
u(x + X, y + ju) + iv(x + X,y + ju)- u(x, y) - iv(x, y)~ 



(A 2 + /*2)l/2 



(10-5) 



464 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 



As/(z) is assumed to be differentiable, result (10-5) must be independent of 
the angle a. To see the implications of this let us first consider the real part 
of the bracketed expression inside the limit (10-5) which is 



u{x + K y + ji) — u(x, y) 
(A 2 + /z 2 )i/2 



(10-6) 



By adding to and subtracting from the numerator of this expression the term 
u(x, y + fj), it is soon verified, with a little manipulation, that it is equivalent 
to 

tu{x + A, y + n) — u(x, y + fi) 



+ 



(A 2 + Z* 2 ) 1 / 2 

u(x, y + fi)- u(x, y)\ 



P 



/" 



(A 2 + /l 2 )l/2 

(10-7) 



Geometry tells us that 
A 



= cos a, 



/" 



= sin a 



(A 2 + /i 2 ) 1 ' 2 ' (A 2 + /< 2 )i/2 

so, when taken in conjunction with the fact that | h | -»■ implies X -> 0, 
fi-*0, the limit of expression (10-7) as | h \ ->• becomes 



du du . 
— cos a H sin a. 

8x By 



(10-8) 



An identical argument applied to the imaginary part of the bracketed 
expression inside the limit (10-5) yields the result 

dv dv . 
— cos a. -\ sin a. 

ex cy 



Hence, the limit (10-5) is equivalent to 



/'(z) = e-'» 



(du ou . \ (8w dv \ 

— - cos a + — sin a I + / 1 — cos a -\ sin a I 

\ox cy J \dx By / 



(10-9) 



For f'(z) to be independent of the manner in which h-*0, it follows that 
Eqn (10-9) must be independent of the value of a. In particular, the real and 
imaginary parts of this expression must be independent of a. Expressing 
f'(z) in real-imaginary form we obtain 



/'(z) = 



~Su 

— cos 2 a + 



Bx 

+ i 



Bv 
By 



tou ov 

sin'' a + I 1 

\By Bx 



sin a cos a 



'Bv Bu . (Bv 8u\ . 
— cos 2 a — — sin 2 a + — sin a 

Bx By \8y Bx) 



cos a 



(10-10) 



SEC 1 0' 4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 465 

Inspection shows that it can only be independent of a if both the following 
conditions are satisfied: 

8u 8v , 8u 8v 

Tx'Ty 3nd Ty--JZ C ' 11 ) 

These are known as the Cauchy-Riemann equations and are fundamental 
to the development of the theory of functions of a complex variable. An 
immediate consequence of the Cauchy-Riemann equations is that Eqn 
(10-10) may be written either as 

_,, 8u 8v 
f (z) = — + i — (case a = 0) (10-12) 

or as 

,,, dv 8u 

J^ = Jy~ l Jy fc^ K = ^ (10-13) 

It has thus been established that if a function /(z) is to have a uniquely 
determined derivative at a point in the sense of Definition 10-6, then it must 
satisfy the Cauchy-Riemann equations (10-11). 

We now check whether the converse— the satisfaction of the Cauchy- 
Riemann equations by a function automatically implying that the function 
has a unique derivative— also holds. Let w = u + iv be a function such that 
u, v satisfy Eqns (10-11). Consider first the function u at some point z = 
x + iy. We know from Chapter 5 that at a neighbouring point z + h with 
h — X + ijx, for Aw = u(x + X, y + ji) — u(x, y) we may substitute the 
expression 

8u 8u 

where ei, r\\ -*■ as X, /j, -*■ provided that u x and u y are continuous. A 
similar result is of course true for Ai>, the change in v consequent upon moving 
from z to z + h, though for ei, r\\ we must substitute £2, r\i and require that 
v x , v y are continuous. 

Thus if A/=/(z + h) -f{z), we have 

. . 8u , 8u (dv . 8v \ 

A/= Tx l + 8~y * + ' \Tx X + Ty * ) + ^ + ^ + ^ + W>" 

Using the Cauchy-Riemann equations this can be re-expressed as 

(8u 8v\ , 
f= \8x +i 8x'} h + (£1 + iE2)X + (m + ir,2)f *> 



466 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 



whence 



A/ du dv 
n dx dx 



(fy + (Vi + iV2) (£)• (10-14) 



However, | A | < | A | , | /* | < | A | so that 
A 



<1, 



<i; 



and as ei, £2, r?i, and 172 all tend to zero as A, /j, -> 0, by taking the limit of 
Eqn (10-14) as | A | -*■ we arrive at 

ft \ du _l • 8v 

ox dx 

The fact that/(z) is assumed to satisfy the Cauchy-Riemann equations 
and to have continuous partial derivatives u x , u y , v x , and v y has thus enabled 
us to prove that/(z) has a unique derivative. We have established the follow- 
ing fundamental theorem. 

theorem 10-6 (Cauchy-Riemann theorem) If u(x,y) and v(x,y) have 
continuous first order partial derivatives in some region, then necessary and 
sufficient conditions that/(z) — u + iv should have a derivative at each point 
z = x + iy of that region are that 

du dv du dv 

dx dy dy dx 

Results (10-12) and (10-13) may be used to deduce the form off'(z) by 
using the simple observation that when z is purely real, so that z = x, the 
forms assumed by/'(z) and/'(x) are identical. Similarly, when z is purely 
imaginary, so that z = iy, the forms of/'(z) and/'Oj) are identical. This 
gives the following straightforward rule for determining the derivative 
f'(z) of the function /(z) which is sometimes helpful. 

Rule 1 (Determination of the derivative of a complex function) 

If/(z) = u + iv satisfies the Cauchy-Riemann equations, then the derivative 
f'(z) expressed in terms of z may either be deduced 

(a) from the result 

W = -x + 1 Tx 
by formally setting y = 0, and then replacing x by z; or 

(b) from the result 



SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 467 



ft \ dV ■ tu 
f(z) = — -l — 

cly cly 
by formally setting x = 0, and then replacing iy by z. 

Example 10-14 Determine which of the following functions satisfy the 
Cauchy-Riemann equations and thus possess uniquely defined derivatives. 
Give the form of this derivative when it is defined. 

(a) iv = z 2 ; 

(b) w = cos z; 

(C) IV = | Z |. 

Solution 

(a) If iv = z 2 , then iv = (x + iy) 2 = x 2 - y 2 + ilxy and so u = x 2 - y 2 , 
v = 2xy, So u x — 2x, u y = — 2y, v x = 2y, and v„ = 2x. It is readily seen 
that these expressions satisfy the Cauchy-Riemann equations and so we may 
conclude that iv = z 2 possesses a unique derivative. It follows from Eqn 
(10-12) that 

f\z) = 2x + i2y = 2z. 

This result was so simple that appeal to Rule 1 was not necessary. 

(b) If w = cos z, then w = cos (x + iy) = cos x cos iy — sin x sin iy, 
when w = cos x cosh y — i sin x sinh j, and so u == cos x cosh 7, 
r = — sin x sinh j. Hence, u x — — sin x cosh 7, Hj, = cos x cosh y, 
v x = — cos x sinh y and v y — — sin x cosh ;-. Here also it is immediately 
apparent that the expressions satisfy the Cauchy-Riemann equations, 
showing that w = cos z possesses a unique derivative. 

Let us choose to work with Rule 1 (a) to determine f'(z) in terms of z. 
We must therefore start with the equation 

ft \ Bu , ' 8v 
ex ex 

In this case We find 

f\z) =b —sin x cosh y — i cos x sinh y. 
Then, setting y = and replacing x by z gives 

/'(z) = - sin z. 

It is instructive to compare this rapid method with the direct approach 
we now indicate. 

f\z) = — sin x cosh y — i cos x sinh y 
= — sin x cos iy — cos x sin iy 
= — sin (x + iy) = — sin z. 



468 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

(c) If w=\z |, then w = (x 2 + y 2 ) 1 ' 2 , showing that u = (.v 2 +J 2 ) 1 / 2 , 
v = 0. Then, as u x = xj{x 2 + y 2 ) 1 ' 2 , u y = yj(x 2 + y 2 ) 1 ' 2 , v x = v y = 0, it is 
clear that w = | z | cannot satisfy the Cauchy-Riemann equations anywhere 
in the complex plane. We conclude that w = | z | has no derivative at any 
point in the complex plane. 

Example 10- 15 Determine the constants a and b in order that 

»v = x 2 + ay 2 — 2xy + i(bx 2 — y 2 + 2xy) 
should satisfy the Cauchy-Riemann equations. Deduce the derivative of ir. 

Solution Here we have u — x 2 + ay 2 — 2xy, v = bx 2 — y 2 + 2.yv so that 
u x = 2x — 2y, u y = 2ay — 2x, v x = 2bx + 2y, and v y = —2y + 2x. It is 
certainly true that u x = v y , so that the first of the Cauchy-Riemann equations 
is automatically satisfied. For the second equation to be satisfied we must 
require that u y = — v x , or 2ay — 2x = —{2bx + 2y). This is only possible 
ifa= -l,b= 1. 

Now as/'(z) = u x + iv x , we have 

f'{z) = 2x - 2y + i(2x + 2y). 

Again, working with Rule 1 (a) gives 

f\z) = 2(1 + i)z. 

Had we chosen to work with Rule 1 (b) to express/'(z) in terms of z we should 
have started from the equation 

f'{z) = Vy — iUy 

which in this case becomes 

f\z) = -2y + 2x + i(2y + 2x). 
Then, setting x = and this time replacing iy by z, we again arrive at 

f\z) = 2(1 + i)z. 

As the complex number z can also be expressed in modulus argument form 
by writing z = re'", it is necessary to know the form taken by the Cauchy- 
Riemann equations in terms of the variables (r, 6). This is most readily 
achieved by appeal to Theorem 5-22. 

It follows directly from Theorem 5-22 that : 

8u _ 8r du 86 du 8u _8r 8u 86 cu 

8x~~dx~dr ~d~x 8~6 8y ~ ' Ty Tr + 8~y 86 

} (10-15) 

8v_8r8v 86 8v 8v _ or 8v 86 8v 

8x dx 8r 8x 80 8y cy or 8y 86 



SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 469 

In these equations (r, 6) are the polar coordinates of the point (x, y) and so 

x = r cos 8, y = r sin 8. 

(See Eqns (4-13).) 

These relationships may now be used to determine drjdx, BrjBy, B8\Bx, 
86 jdy as follows : 

8r 1 



cos 6 


ana so 


8x cos 6 






cos 6 = - 
r 


whence 


■ ,88 
— sin 8 — = 

8x 


1 , 38 

and so — 

r ox 


1 


r sin 6' 


sin 6 


and so 


8r 1 
8y sin 6' 






sin 6 = y - 
r 


whence 


88 1 
cos 8 — — - 

8y r 


and so — = 

3y 


1 


rCOS 8 



8u 1 8v 


and 


1 8u 


8v 


dr r 86 


r 88 


8r 



Combination of these results with Eqns (10- 15), followed by some simple 
manipulation, then establishes that the polar form of the Cauchy-Riemann 
equations is 

i 1 /)ii Pin 

(10-16) 

Functions f{z) that are uniquely defined in some neighbourhood of a 
point zo and satisfy the Cauchy-Riemann equations at zo and throughout 
that neighbourhood are called either analytic or regular functions. Points at 
which a function ceases to be analytic are called singularities of the function. 
Thus the function /(z) = l/(z + 1) is easily seen to be analytic everywhere 
except at the point z = — 1 , which is a singularity. 

Supposing that u X y, v xy exist and are continuous, it follows directly by 
partial differentiation of the Cauchy-Riemann equations u x = v y , u y = —v x 
that 

8 2 u B 2 u 8 2 v 8 2 v 

Bx 2 By 2 Bx 2 By 2 

These equations are identical in form and are examples of an important 
partial differential equation called Laplace's equation, any solution of which 
is called a harmonic function. The harmonic functions u and v associated 
with an analytic function /(z) = u + iv are called conjugate harmonic 
functions. For example, we have seen that 

cos z = cos x cosh y — i sin x sinh y 

is an analytic function with u = cos x cosh y, v = — sin x sinh y. Now both 
u and v are such that u xy , v xy are continuous, so it follows immediately that 



470 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

u and v satisfy Eqns (10-17). Hence u = cos x cosh y, v = — sin x sinh y are 
conjugate harmonic functions. The term conjugate is, of course, used here 
in a different sense from when discussing complex conjugates. 

If u, v are harmonic functions and we consider the analytic function 
w = u + iv, then an obvious modification of the arguments that gave rise 
to Rule 1 leads to the following rule for the expression of w in terms of z. 

Rule 2 (Expression of an analytic function in terms of z) 

If u, v are conjugate harmonic functions, then the analytic function w = 
u + iv expressed in terms of z may be deduced either by : 

(a) formally setting y = in the expression w — u + iv and then re- 
placing x by z; or 

(b) formally setting x = in the expression w = u + iv and then re- 
placing iy by z. 

Example 10-16 Show that u = 2xy + 3y is harmonic and determine its 
harmonic conjugate v. Express the functions dn'/dz and w = u + iv in terms 
of z. 

Solution We have u x = 2y, u xx = 0, u y = 2x + 3, u yy = 0, showing that 
Uxx + Uyy = 0. Hence u is harmonic. If v is to be the harmonic conjugate of 
u then the functions u, v must satisfy the Cauchy-Riemann equations 
u x = v y , u y = —v x . 

Using the known expressions for u x , u y we find that 

(a) 2y = Vy, and (b) 2x + 3 = — v x . 
Integration then gives : 
from (a), 

v — y 2 + f(x) + const, 
from (b), 

v = — x 2 — 3x + g(y) + const, 

where as yet/(x) is an arbitrary function of x and g(y) is an arbitrary function 
of j. However, as these are two alternative expressions for the same function 
v they must be identical, whence /(x) = — (x 2 + 3x) and g(y) = y 2 . Thus 
we have arrived at the expression 

v = y 2 — x 2 — 3x + const 

for the function v, which is the harmonic conjugate of u. 

Applying Rule 1 (a) to find/'(z) requires that we start from 

8u ,dv 
f (z) = 8- X + 'Tx 



SEC 10-5 CONFORMAL MAPPING / 471 

or, in this case, from 

/'(z) = 2y - i(2* + 3). 
So, setting y = and replacing x by z, gives 

f'(z) = -i(2z + 3). 

To express w = u + iv in terms of z we must work with Rule 2. We have 

H' = (2xy + 3y) + i(y 2 — x 2 — 3x) + const, 

so that if we apply Rule 2 (a), we must set y = and replace x by z to arrive 
at 

w = —i(z 2 + 3z) + const. 

It is important to notice when using Rule 2 that the functions u and v 
must be conjugate harmonic functions, since otherwise they will not satisfy 
the Cauchy-Riemann equations and the rule will be inapplicable. Indeed, if 
the rule is applied to harmonic functions that are not conjugate, then the 
functions of z that are generated by Rules 2 (a) and 2 (b) may, or may not 
be identical. In neither case will the result be correct. For example, 

u = sin x cosh y and v = cos x cosh y 

are harmonic functions but they are not harmonic conjugates. Applying 
Rule 2 (a) to w = u + iv generates the function w = sin z + / cos z, whereas 
applying Rule 2 (b) generates the function w = i cos z. For a different 
example, take u = x 2 — y 2 and v = xy, which are also harmonic functions 
that are not conjugate. In this case both Rules 2 (a) and 2 (b) generate the 
same function w = z 2 , though of course this also is incorrect. 

10-5 Conformal mapping 

Thus far we have examined some of the analytical consequences of requiring 
that a function w =f(z) be differentiable. Let us now pursue this matter 
further by studying some of the geometrical implications of differentiability. 

Take two complex planes, which we shall refer to as the z-plane and the 
w-plane, the connection between their respective points being through the 
differentiable function w — f(z). Because each value of z gives rise to a unique 
value of w, it follows that any curve y in the z-plane must correspond to 
some other curve T in the iv-plane. In this sense the iv-plane can correctly be 
described as a mapping of the z-plane. 

For a specific illustration, let us determine how the straight line y = olx 
in the z-plane is mapped by the function w = iz + (1 + i) onto the w-plane. 
We begin by setting w = u + iv, z = x + iy, after which a simple calculation 
yields u=l— y, v = x+l. Hence to find the line in the vv-plane that 
corresponds to y = xx in the z-plane it is now only necessary to set y = olx 
in these expressions for u, v and then to eliminate x between them. Performing 



472 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 



w-plane 




Fig. 10-4 Mapping by the function w = iz + (1 + /). 



these operations we find u = 1 — ax, v = x + 1, whence 

\ a / <x 

This is again an equation of a straight line but this time in the iv-plane, 
The line passes through the point (0, (1 + a)/«) and has the gradient — 1/a. 
Representative lines y\, yi are shown in the z-plane of Fig. 10-4 and their 
respective maps or images are shown as the lines Y\, Yi in the associated 
w-plane. The lines yi, yi correspond, respectively, to a = 1, a = 2. 

It is not difficult to see that the map in the n-plane has been obtained from 
the map in the z-plane by first rotating the original pair of lines anti-clockwise 
through an angle \n and then translating the resulting picture to the point 
1 + / as a new origin. More important than this, however, is the fact that 
the angle 6 between the lines yi, yi is equal to the angle between the lines 
Ti, Yi and, moreover, the sense of rotation is preserved. That is to say if yi 
is inclined to y\ at an angle 6, measured anti-clockwise, then T 2 is also inclined 
to Ti an an angle 6, measured anti-clockwise. 

This is no chance result and, indeed, we now prove that if a function 
f(z) is analytic (that is, satisfies the Cauchy-Riemann equations and so has a 
uniquely defined derivative) then, except for points z at which /'(zo) = 0, 
the function w = /(z) will preserve both the angle and the sense of rotation 
when mapping intersecting curves yi, y% in the z-plane onto corresponding 
intersecting curves Ti, Yi in the n-plane. These properties of a mapping or 
transformation are recognized by saying that the transformation is conformed. 

To prove this general result we now consider a function w = /(z) that is 
analytic in some region of the z-plane and take a point z in that region at 
which/'(zo) ¥= 0. Let yi, yi be two curves drawn in the z-plane that intersect 



SEC 10-5 



CONFORMAL MAPPING / 473 



t 



a 



z-plane 




■m 



(a) 




Fig. 10-5 Conformal mapping w = /(z). 

at zo and let z\ denote a point Q on the curve y\ as indicated in Fig. 10.5. We 
shall suppose that as Q moves away from P along y\ in the direction indicated 
by an arrow in the Figure, so the point h'i =/(zi), which we denote by Q', 
moves away from point P' in the direction indicated. This process thus 
associates a sense of direction with each of the corresponding curves y\ and 
Y\. A similar argument defines directions along y% and ITV 

Now as Q approaches P, so the secant PQ will assume its limiting position 
in which, when it is inclined at an angle ai to the x-axis, it is tangent to y\ 
at zo. As PQ = zi — zo we have 

ai = lim arg (zi — z ). 

zi— zo 

Identical reasoning shows that 
/Si = lim arg (in — w ), 



where /Si is the angle of the tangent to Ti at P' measured from the w-axis. 
Hence we have 

/Si — ai = lim arg (in — iv ) — lim arg (zi — z ) 
and, as arg a — arg b = arg ajb, this may be written 

„ .. /H'i - M'o\ 

/Si — ai = lim arg • 

zi^zo \ Zl — Zo / 

However, as we are assuming/(z) is differentiable 



474 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 



/Wl — R'o\ 

/'(zo) = lim . 

zi-«*o \ z l — z / 



and provided /'(zo) ¥= it then follows that 

0i - ai = arg/'(zo). (10-18) 

In the case that/'(zo) = 0, the amplitude of/'(zo) is indeterminate. Such 
points are called critical points of/(z), by analogy with the real variable case. 
We have seen that/'(zo) is unique, so that the expression on the right- 
hand side of Eqn (10-18) is a constant. The result must, then, also be true for 
any other curve yz, say, and its map r 2 . Hence we have 

/?1 — «1 = 02 — <*2 

or 

a 2 — ai = 02 — 01. 

The curves y\, y% were any two curves which intersected at zo, so we have 
proved the following result. 

theorem 10-7 (conformal mapping) If f(z) is analytic in some region, 
then apart from those points zo in that region for which f'(z ) = 0, the 
mapping w = f(z) preserves both the angle and the sense of rotation when 
mapping intersecting directed pairs of curves in the z-plane into corresponding 
intersecting directed pairs of curves in the w-plane. Such a mapping is said 
to be conformal. 

To close this chapter we now examine some important special conformal 
mappings. Rather than emphasize the algebraic details of the transformations 
or mappings, we shall aim primarily at interpretation in terms of basic 
geometrical operations such as translation, rotation, and change of scale 
(dilatation). 

10-5 (a) The general linear transformation 

The general linear transformation is the name given to the mapping described 
by the equation 

w = az + b, (10-19) 

where a, b are arbitrary constants with a ^ 0. Our introductory example was 
of this form with a = i, b = 1 + /. The mapping (10-19) obviously satisfies 
the Cauchy-Riemann equations and, as dw/dz = a =£ 0, it has no critical 
points and so provides a conformal mapping of the entire z-plane. To 
appreciate the geometrical effect of this mapping consider first the case in 
which a = 1 so that w = z + b. 

This has the effect of generating the w-plane by simply adding a constant 
complex number b to every point in the z-plane. Using the vectorial repre- 



SEC 105 CONFORMAL MAPPING / 475 

sentation of complex numbers this is seen to be equivalent to generating the 
H-plane by shifting the entire z-plane through a distance | b | parallel to the 
vector b. Such a mapping is accordingly called a translation. Another way of 
expressing this result is by saying that if the w- and z-planes were to be 
superimposed, then the 0{u, v} axes would be obtained by translating the 
0{x,y} axes, without rotation, such that in their new position the origin 
coincided with the point z = — b. To see this, remember that b is a vector 
and that the position vector of the origin of 0{w, v} is b relative to 0{jc, y}, 
but that the position vector of the origin of 0{x, y] relative to 0{w, v} is — b. 
Consequently, we may conclude that the mapping w = z + b leaves invariant 
the shape and size of any curve in the z-plane. 

Next we consider the consequences of setting b = so that w = az. If 
we write a = pe ia and z = re 6 , we have w = pre i(,x+e) . This shows that the 
effect on the z-plane of the mapping w = az is to multiply the modulus of z by 
a constant factor p and to increase the argument of z by a constant angle a. 
Hence w = az corresponds to a magnification, or dilatation, of every z by a 
constant factor \a\, and a rotation about the origin of every z by a constant 
angle a. Thus we may deduce that the general linear transformation 

w = az + b 

of the z-plane may be described geometrically as the combination of a 
dilatation, a rotation, and a translation. In the trivial case a = 1, b = the 
mapping reduces to an identity. 

10-5 (b) The mapping w = z n 

A typical example of this form is provided by the function w = z 2 . As it is 
interesting to interpret mappings in terms of both polar coordinates and 
cartesian coordinates, let us first study the polar representation. To do this 

we set z = re? e , w = pe? 4 , when we find 

p(cos (/> + i sin <£) = r 2 (cos 26 + i sin 26), 

showing that p = r 2 and <j> = 26 + 2mr, where n = 0, 1, 2, . . .. However, 
for our purposes we shall disregard this ambiguity of the angle <f> with respect 
to multiples of 277, since all angles in polar coordinates are indeterminate in 
this manner. 

In words, the effect of the mapping w = z 2 is to square the modulus of 
every number z and to double its argument. This is very easily illustrated by 
appeal to Fig. 10-6 depicting the mapping of a shaded portion of an annular 
region in the z-plane into another, larger, annular region in the w-plane. The 
conformal nature of the mapping is reflected by the fact that at the corres- 
ponding corners of the figures the angles between the boundary lines together 
with their senses have been preserved. They are of course equal to \n in this 
instance. 

Because of the properties just outlined it is readily seen that the function 



476 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 




(a) 



w-plane 




(b) 

Fig. 10.6 The polar mapping w = z 2 . 

w = z 2 maps the upper half z-plane onto the entire w-plane. When this is 
done it is necessary to exclude the origin in the w-plane together with all the 
points on the positive w-axis, since these are mapped twice. In fact they 
correspond to points on both the positive and negative parts of the real axis 
in the z-plane. The origin in the w-plane is in fact a critical point, for w' = 2z 
vanishes at z = 0. This exclusion of a line of points in the w-plane is often 
described by saying that the w-plane has been cut along the real axis. 

The effect of the mapping is more striking if it is displayed in terms of 
x and y by again setting w = u + iv, but this time writing z = x + iy to 
obtain u = x 2 — y 2 , v = 2xy. These equations show, for example, that the 
straight line x = a maps into the curve u = a 2 — y 2 , v = 2ay in the w-plane 
which, after elimination of y, is seen to be equivalent to v 2 = 4<x 2 (a 2 — u). 
Similarly, the straight line y = /S may be seen to map into the curve v 2 — 



SEC 10-5 



CONFORMAL MAPPING / 477 




Fig. 10-7 The Cartesian mapping w = z 2 . 



(a) 



-6 



x = 0-5- 



x=\ 



-4 



y = l 



r 


^^^ \ 

















-6 



x= 1-5 



x = 2 



-j>=1 



-^ = 0-5 





















vf\ 

















(b) 



478 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

4/5 2 (/? 2 + u) in the w-plane. These equations describe two parabolas that are 
symmetrical about the w-axis, as shown in Fig. 10-7. 

The lines x = l,y = 3/2 denoted by y\ and 72, respectively, in the z-plane 
map into the parabolas Y\ and T 2 in the w-plane. This shows that the single 
point z = 1 + 3z'/2 denoted by P in the z-plane (that is, the point (1, 3/2)) 
maps into the pair of points P' and P" in the w-plane determined by the two 
points of intersection of parabolas Y\ and T 2 . Again the conformal nature 
of the transformation is reflected in the easily checked geometrical fact that 
the two families of parabolas are mutually orthogonal, as are the lines 
x = const, y — const in the z-plane. 

The more general mapping w = z n may be analysed in similar fashion, 
though the algebraic complexity is naturally greater. When n is integral the 
mapping may be seen to transform the segment < arg z < 2-rr\n into the 
complete w-plane with a suitable cut along the w-axis. (Care must be exercised 
when n is fractional for then the mapping is many valued. We shall not 
pursue this matter further.) 

10-5 (c) The inversion w = 1/z 

For obvious reasons the mapping w = 1/z is called the inversion mapping. 
Its geometrical effect may be deduced by setting w = />e**, z = re ld to find 

p(cos <f> + i sin <f>) = - (cos 6 — i sin 6). 

Arguing as with the function w = z 2 , we then see that this implies that 
P =llr,4>=-6. 

Expressed in words, the inversion mapping w = 1/z transforms a point 
in the z-plane with modulus r and argument 6 into a point in the w-plane 
with modulus \jr and argument —d. This may be interpreted geometrically 
by appeal to Fig. 10-8 in which the w- and z-planes are shown superimposed 
with a common origin, and P is any point in the z-plane with P' denoting its 
image in the w-plane. 

The circle shown in Fig. 10-8 is the unit circle \z\ = 1, and point Q on 
the radius vector drawn from O to P is such that OP . OQ = 1. Hence if 
OP = r, then OQ = \jr. In geometrical terms point Q is said to have been 
obtained by inverting point P with respect to the unit circle. Point P', which 
is the image in the w-plane of the point P in the z-plane, is then obtained by 
reflecting Q in the x-axis. 

Thus the mapping w = 1/z corresponds to the inversion of points z with 
respect to the unit circle, followed by their reflection in the real axis. The 
inversion mapping thus maps the points interior to the unit circle about the 
origin of the z-plane onto the exterior of the unit circle about the origin of 
the w-plane, and vice-versa. The two unit circles map onto one another. 

Algebraically, we write w = u + iv, z = x + iy, when 



SEC 10-5 



CONFORMAL MAPPING / 479 




Fig. 10-8 Inversion in unit circle followed by reflection in the x-axis. 



+ /' 



V — 



-y 

x 2 + _y 2 



To learn how the line x = a in the z-plane maps onto the w-plane we need 
only set x — a in the expressions for u and v and then eliminate y to obtain 
the equation 



M a + v z _ _ = 
a 



Similarly, the line y = {} in the z-plane maps onto the curve in the w-plane 
defined by the equation 



W 2 + v * + = 0. 
P 

When these equations are rewritten in the form 



(-=)* + -(=)' 



and 



M 2 + 



(" + ^) ! -(^)' 



it is easily seen that the line x = a in the z-plane has for its image in the w- 
plane a circle of radius \a. with its centre at (|oc, 0), whilst the line y = ft in 
the z-plane has for its image in the w-plane a circle of radius \$ with its 
centre at (0, — \p). We may conclude that lines parallel to the x- and j-axes 
map onto circles in the w-plane which pass through the origin and have their 



480 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

centres on the u- and u-axes. 

Had the general straight line y — mx + c in the z-plane been mapped, 
then this same form of argument would have shown that any such line not 
passing through the origin will transform into a circle through the origin in 
the w-plane. Lines through the origin in the z-plane transform into lines 
through the origin in the tv-plane. The verification of these remarks is left 
as an exercise for the reader. 

10-5 (d) The bilinear transformation 

Any mapping of the general form 

az + b 
w = — — , (10-21) 

cz + d ' 

is called a bilinear transformation or a linear fractional transformation. The 
general linear transformation and the inversion mapping are special cases of 
the bilinear transformation. We now show that bilinear transformations are 
characterized by the property that they map circles and straight lines in the 
z-plane onto circles and straight lines in the w-plane, though not necessarily 
in this order. 

Let us now write the transformation (10-21) in the form 

a ad — be 

w = ~c ~ c* { z + mi (10 ' 22) 

We assume c ^ and ad — be ^ 0; this is justified since if c = the trans- 
formation reduces to the general linear transformation, whereas if ad — be 
= 0, then w reduces to a constant. So, if we define new variables z\ and z 2 by 

d 1 

zi = z + -, z 2 = -. (10-23) 

C Zi 

then (10-22) becomes 

a lad — bc\ 

w= c-(^H z2 - (10 ' 24) 

We must now consider the sequential effect of the mappings that trans- 
form from the z-plane to the w-plane via the intermediate planes z\ and zz. 
The mapping from the z-plane to the zi-plane is a pure translation and thus 
leaves the shape and size of all curves invariant. The mapping from the 
zi-plane to the Z2-plane is an inversion and, as we have just seen, maps 
straight lines not passing through the origin onto circles, and straight lines 
through the origin onto straight lines. Finally, the mapping from the Z2-plane 
to the w-plane is a general linear transformation and so comprises a rotation 
and a translation. Hence, in particular, this final mapping will transform 
straight lines into straight lines and circles into circles. This justifies our 
earlier statement that the bilinear transformation maps straight lines and 
circles into straight lines and circles, though not necessarily in this order. 



SEC 10-5 CONFORMAL MAPPING / 481 

Example 10-17 Find the image in the vc-plane of the circle \z\ = 2 if 

z — i 



z + i 



u — 



Solution Setting w = u + iv, z = x + iy we find that 
x 2 + y 2 - 1 -2x 

X 2 +y 2 +2 y +\ V ~ X 2 + yl + 2 y + 1 

Now the circle | z | = 2 has the equation x 2 + J 2 = 4, which used in the 
expressions for u, v gives 

3 -2x 

U = , V = 



2y + 5 2>< + 5 

Next, solving these for x and _y, we find 
-3v 1 /3 



x = — — » v = 5 

2« 7 2 \ M 

so that on the required circle x 2 + y 2 = 4 this pair of equations is equivalent 
to 

3(m 2 + v 2 ) - 10m + 3 = 0. 
When this equation is expressed in the form 

it can be recognized as the equation of a circle in the iv-plane having a 
radius of 4/3 and its centre at the point (5/3, 0). 

This conclusion could have been obtained more easily by using the 
following argument. The equation 

z — I 

w = 

z + i 

is equivalent to 



5 Y , 16 



./l + w\ 



Hence, as zz = .y 2 + j 2 , we have 

*2 + j2 = /( _ /!±^\/i±^n = i + w + w + z g 

\ 1 — Wf \ 1 — WJ 1 — W — W + WW 

In terms of w — u + iv, w = u — iv this becomes 

1 + 2h + M 2 + V 2 

x l + y l = 

7 1 - 2m + « 2 + ^2 



482 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

and, on the circle x 2 + y 2 = 4, it reduces our previous result 
3(w 2 + v 2 ) - 10m + 3 = 0. 

10-6 Applications of conformal mapping 

In any first account of the theory of conformal mapping, it is impossible to 
do more than merely indicate its application in science and engineering. 
From the fields of elasticity, electromagnetic theory, fluid mechanics, and 
heat conduction in which these ideas play important roles, we choose just one 
simple example. Our choice, from fluid mechanics, is solving the problem of 
the two-dimensional flow of an incompressible fluid around the interior of a 
wedge shaped region, on the assumption that the flow has a special property 
which enables it to be classified as being irrotational. These are in fact con- 
ditions which are usually valid in most low speed flows of ordinary fluids. 

In books on fluid mechanics it is established that if q\ and qz are the x 
and y components of velocity at a point in an incompressible inviscid fluid 
that is undergoing two-dimensional flow, then under the stated conditions 
these components may be written in the form 

qi = ^ V = ^ (10-25) 

8x By 

where <f>(x, y) is a function called the velocity potential of the flow. The lines 
fy(x, y) = constant are called equipotentiah. Using the vector interpretation 
of complex numbers we may thus represent the fluid velocity q by the complex 
variable 

*-¥ + & d°- 26 ) 

8x 8y 

It can also be established that if fluid is neither created nor lost within the 
flow region, then <f>(x, y) must be such that 

^ + »+ - 0. (10-27) 

8x 2 8y % 

Thus <f> satisfies Laplace's equation and so is harmonic. Introducing the 
harmonic conjugate of <f>, which we shall denote by ip(x, y), enables us to 
define a further complex variable F{z) by the equation 

F(z) = <Kx, y) + iy(x, y). (10-28) 

This is called the complex potential and xp{x, y) itself is called the stream 
function of the flow. Now by the nature of the construction of F(z), it is 
differentiable in the sense of Definition 10-6 and so satisfies the Cauchy- 
Riemann equations. Hence 

<f>x = Vy, <l>v = —Wx 



SEC 10-6 



APPLICATIONS OF CONFORMAL MAPPING / 483 



or, in terms of q\ and q 2 , 

qi=<j>x = y y , q^ = 4> y = -f x . (10-29) 

These relationships provide the justification for the name stream function, 
for they show that the velocity vector is everywhere normal to the curves 
</"(*> y) = const. This follows because on cf>(x, y) = const, <}>^dx + <f>ydy = 
showing that 

dy _ — 4>x 

dX (f>y 

Hence if n is the gradient of the normal to a curve <f>(x, y) = const, then 
n(dyldx) = -1, whence n = j> v \<$> x . However, from results (10-29) this is 
equivalent to n = q 2 lqi, which is the slope of the curve traced by a fluid 
particle. Hence the curves f(x, y) = const are curves along which fluid flows 
and so can properly be called streamlines. 
Consider the complex potential 

F(z) = Uoz, (10-30) 

where Co is a positive real number. Then we have at once 

<£ = U x, ip = U y. (10-31) 

The streamlines y = a are thus the lines y = a/t/ , and the velocity q is 

ox By 

Thus the complex potential F(z) = U z must characterize a uniform flow, 
with velocity U parallel to the x-axis and directed in the sense of increasing 
x. This is illustrated in Fig. 10-9 (a). 



II 



-plane 



iv-plane 



*► 




(a) (b) 

Fig. 10-9 Transformation of fluid Row: (a) uniform flow in upper half plane; 
(b) flow inside wedge. 



484 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

Now if we consider the transformation 

w = z 1 ' 3 , (10-32) 

then we know from the arguments used in connection with the mapping 
w = z" that it will map the upper half of the z-plane onto the wedge < arg 
w < 577 in the w-plane. 

Then, as (10-32) is equivalent to z = w 3 , we must have 

x + iy = (u 3 — 3uv 2 ) + i(3u 2 v — v 3 ) 
giving 

x = u 3 - 3uv 2 , y = 3u 2 v - v 3 . (10-33) 

Hence the velocity potential is 

<£ = u (u 3 - 3uv 2 ) (10-34) 

and the stream function 

f = U (3u 2 v - v 3 ). (10-35) 

Thus the curves f = const define the streamlines inside the wedge shaped 
region, and some representative streamlines are shown in Fig. 10-9 (b). To 
determine the speed at any point within the wedge we use the fact that 

dF 8(j) 8w 

— = — + /—= ?i - iqi, 

dz ox ox 

showing that the speed | q \ is given by 

, , dF 



\q\ = 



dz 



(10-36) 



As the complex potential is 

F(z) = U w 3 , (10-37) 

we have 

^=3U w 2 
dz 

and, finally, 

| q | = 3t/ | w 2 | = | (u + iv) 2 | = u 2 + v 2 . (10-38) 

Thus at a point P with coordinates (mo, vo) within the wedge, the speed 
| q | = mo 2 + tfo 2 . The streamline through the point P is provided by Eqn 
(10-35), for the constant associated with this streamline through P must be 
3«o 2 f o — ^o 3 , so that the streamline itself has the equation 

3u 2 v — v 3 = 3uo 2 vo — vo 3 . 



PROBLEMS / 485 

As mentioned at the beginning of this section, conformal mapping has 
many other applications, all related to solutions of Laplace's equation in 
two dimensions. The application described here can provide no more than 
an indication of one of these situations. 

PROBLEMS 

Section 101 

10-1 Test the following sequences {z n } for convergence and, where appropriate, 
find the limit y stating whether or not it is a member of the sequence. 

(a) z„ = 2» + /'3-«; 

3 1 

(b) z n = n tan - + in sin - ; 

n n 

< c > *» = „-(_i). + 4 '; 



(e)z n = isin^+icos^. 



10-2 Give examples of: 

(a) a non-convergent sequence {z„}; 

(b) a convergent sequence {z„} with limit 2 + 3i. 

10-3 Given that the sequences {w n }, {z„} are defined by 

-=( 1+ -I) + '-(5^t) - *— -l + 'fci^)- 

find the limits of the sequences {w n + z„}, {w„z n } and {w„/z„}. 
10-4 Identify the limit points of the sequence {z„} where 

10-5 The general term of the sequence {z„} is 

_ / 2«2 + 1 \ . / «>W \ 
*•" (3^ + 2 w + 3J + ' COS (^nj- 

Find values of a for which {z„} has : 

(a) one limit point, 

(b) two limit points, 

and state their location. Are the values of a unique? 

10-6 Construct examples of a sequence {z„} which has : 

(a) two limit points; 

(b) three limit points; 

(c) no limit points. 



486 / FUNCTIONS OF A COMPLEX VARIABLE 



CH 10 



Section 10-2 
10-7 Sketch each of the following curves defined in the complex plane: 

(a) x = s, y = \/(l - s 2 ) for -1 < s < 1 ; 

(b) x = a sin s, y = b cos s for < s < 2-n (a, b real) ; 

(c) x = cosh s, y = sinh j for — co < j < co ; 

(d) |z+2-/| =3; 

(e) zz = 4. 

Sketch the region defined by each of the following sets of inequalities and 
indicate when the boundary points belong to the region so defined. 

10-8 Im(z + iz) > and Re z > 0. 

10-9 2 < | z | < 3 with < arg z < \n. 

10-10 1 < | z - 1 | < 2 and 1 < | z + 1 | < 2. 

10-11 Sketch the region that lies inside the curve defined by 

arg (z + 2) - arg (z + 3) = in 

and is such that Im z > J. 

Give an alternative representation of this region. 

1012 Draw the curve C defined by 

arg (z - - arg (z - 1) = \t*. 




Problem 10-13 

10-13 Define the figure-eight-shaped curve shown in the diagram in terms of argu- 
ments of complex numbers. The curves Ci and Cz are arcs of circles with 
centres Oi and O2, respectively. 

10-14 Sketch a simply shaped region in the complex plane and define it: 

(a) parametrically; 

(b) directly in terms of z. 



PROBLEMS / 487 



Section 103 



10-15 For what values of z are the following complex functions defined : 
(a) w = z 2 + iz + 1 ; (b) w = (z - l)/(z - 2); 

(c) (z + l)(z - i)(z 2 + 4); (d) h> = sinh z. 

10- 16 If/(z) = u + iv, find the expressions for the functions u, v in terms of x, y 
given that: 

(a) /(z) = z 2 + zz + 1 ; (b) /(z) = £±i; 

(c) /(z) = cosh z; (d) /(z) = cos z. 

1017 Given the following forms of /(z) deduce their value if z = 1 + 2i: 
(a) /(z) = x 2 + 3xj + iy 2 ; 

x 2 + 2;> + 1. 

x 2 + y 

(c) /(z) = sin y (x 2 - ;> 2 ) + / cos ^f (x 2 + if), 



(")/(*)= r24 ., ;2 



1018 Use Definition 10-4 to prove 

lim (2z 2 - 1) = -(1 + 4/). 

z— 1 -i 

1019 Use Definition 10-4 to prove 

Hm (£=!)- -6. 

z— 8/2 \2z + 3 J 



10-20 Use Definition 10-4 to 



prove 



(2 - /z)(z 2 - 1) 

hm — r; — u — = 2( - 2 - ')• 

z— 1 (.Z — 1) 

10-21 Given that/(z) = z 2 + z - 2,g(z) = z + 2 deduce: 

(a) lim[/(z) + 2 (? (z)]; 

z— 2 

(b) limf(z)g(z); 

z->-i 

(0 Hm M 

z-i-2^0) 



10-22 Prove that 

lim fc\ = 

1*1-0 \ z J 

by considering 



lim 
1*1 



o \ zz / 



writing z = x + iy, and then arriving at the result by displaying the function 
whose limit is to be considered in terms of its real and imaginary parts. 



488 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 



Deduce that 



lim 
|s|-*o 



(sin az\ 
— )-"■ 



where a may be a complex number. 
10-23 Use the result 



lim l^) = 1 

UI-o\ z / 



established in Problem 10-22 above together with the identity 
— cos 2 z 



(sin z\ 1 



(l + : 



to prove that 

lim (Lz™l\ = o. 

10-24 For what value of a is the function 
3z for z t^ i 
for z = i, 
continuous at z = /. 
10-25 Give an example of a function /(z) that : 

(a) is continuous everywhere ; 

(b) has a limit 3 + 2i as z — >- 1 + i, but is not continuous at z = 1 + /'. 

10-26 Use Definition 10-5 to give a direct proof that /(z) = z 2 is continuous 
everywhere. 

10-27 Use the trigonometric identity 

Iz + zo\ . /z - zo\ 
sin z — sin zo = 2 cos I — - — I . sin I — - — I 

and the last result of Problem 10-22 above to give a direct proof that/(z) = 
sin z is continuous for all z. 

10-28 Give reasons to justify the assertion that 

/(z) = z sin (z 2 + 3z + 2) + l/(z + 2 - /) 

is continuous everywhere except at z = — 2 + i. 

Section 10-4 

10-29 Use Definition 10-6 to prove that if w = az 2 , where a is any constant, then 

dtv 

— = 2az 

dz 

for all z. 

10-30 Use Definition 10-6 to prove that if /(z) is a differentiable function of z in 
some region, then in that region 

& WW -& + >%■ 



PROBLEMS / 489 



10-31 By using the series representation of the hyperbolic sine function prove that 



/sinh z\ 
Po(— J" 



Then, using the identity 

sinh zi — sinh Z2 = 2 sinh [(zi — z 2 )/2] cosh [{zi + z2>/2], 

which may be derived directly from identity (6-29), show by means of 
Definition 10-6 that if w = sinh z, then 

dw 

-r- = cosh z 

dz 

for all z. 

10-32 Show by means of Definition 10-6 that the function /(z) = | z | is not differ- 
entiable at the origin. Find the limiting value assumed by the difference 
quotient at the origin (that is, with zo = 0) as h -»■ along the line y = he. 

10-33 Determine which of the following functions /(z) satisfy the Cauchy-Riemann 
equations : 

(a) /-(z) = z 3 -;z 2 + 3; 
(b)/(z) = cosh(z + 3/); 

(c) /(z) = z sin z + zz; 

(d) /(z) = (*3 - 3xy 2 ) + iQx 2 v - y 3 ); 

(e) /(z) = z(r + z)/2; 

(f ) /(z) = sinh 3x cos j + i cosh 3x sin j\ 

10-34 Find the points, if any, at which the following functions are not analytic: 

(a) /(z) = 3z + sinhz; 

(b) /(z) = z\(z + 2); 

(c) f(z) = cos 1/z; 

(d)/(z) = |^. 

10-35 Find the values of the constants a and b in order that the functions w should 
satisfy the Cauchy-Riemann equations : 

(a) w = a sin x cosh Z>/ + /2 cos jr sinh/; 

(b) w = x 3 - oxy 2 - x + 1 + /'(3^ 2 - by 3 - 1). 

10-36 Using the method outlined in the text, show that if x = r cos 8, y = r sin 9, 
then the polar form of the Cauchy-Riemann equations is : 

— = l— a \ d JL- 8v 

dr ~ r' dd 7' dd Jr 

10-37 Determine which of the following functions /(z) satisfy the Cauchy-Riemann 
equations : 

(a) w = (r 2 cos 2 6 + 2) + ;> 2 sin 2 0; 

(b) w - (r 3 cos 36 + 2r cos 6 + 4) + i(r 3 sin 36 + r sin «); 

(c) w = |r + -| cos d + i (r - -\ sin 9; 

(d) w = r 2 cos 2 9 + /V 2 sin 2 6 + 4; 

(e) w = sin (r cos 9) . cosh (r sin 9) + / cos (r cos 9) . sinh (r sin 9). 



490 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 

10-38 Find the values of the constants a, b, and c in order that the following 
functions should satisfy the Cauchy-Riemann equations: 

(a) w = a log r + i(6 + br); 

(b) w = r a cos £0 + ibr c sin \ d. 

10-39 Verify that the following functions w satisfy the Cauchy-Riemann equations 
and in each case express the derivative of w as a function of z: 

(a) w = (x z — 3xy 2 + y) + i(3x 2 y — y 3 — x); 

(b) w = (x sinh x cos y — y cosh x sin y) + i(y sinh x cos y 

+ * cosh x sin _y) ; 

(c) w = e ax (cos ay + i sin ay). 

10-40 Find which of the following pairs of functions are harmonic conjugates. 
Deduce the representation of w = u + iv in terms of z for the pairs that are 
harmonic conjugates, first by using Rule 2 (a), and then by using Rule 2 (b) : 

(a) u = x 2 — y 2 + 1y, v = 2x(j — 1); 

(b) u = sin x cosh j, v = cos x sinh y; 

(c) u = x sin x cosh y — y cos x sinh j, 

y = — (x cos x sinh y + y sin x cosh j) ; 

(d) u = sinh x cos j, t> = cosh x sin y. 

10-41 Show by differentiation that v = x 2 — y 2 + 2y is harmonic and deduce its 
harmonic conjugate u. Express the function w = u + iv, and its derivative, 
in terms of z. 

10-42 Show by differentiation that u = cosh x cos y is harmonic and deduce its 
harmonic conjugate v. Express the function w = u + iv, and its derivative, 
in terms of z. 

Section 10 5 

10-43 Sketch the images in the >v-plane of the line y = 2x — 1 in the z-plane that 
result from the mappings : 

(a) w = iz — (2 + i) ; 

(b) tv = 2z + 3; 

(c) w = (l + 7)2+ 1. 

10-44 Determine the images in the iv-plane of the circle | z — 1 | = 1 in the z-plane 
that result from the mappings : 

(a) w = Iz — i; 

(b) w = (i - 1)2 + 2. 

In each case shade the regions in the w-plane that correspond to the interior 
of the circle | z — 1 | = 1. 

10-45 Sketch the region in the iv-plane corresponding to the region x > 2, y < x in 
the z-plane given that 

w = (2i - X)z + (1 + /). 

10-46 Determine the equation of the line in the tv-plane which is the image of the 
line x = 1 in the z-plane under the mapping 

w = z 3 . 

10-47 Give an algebraic proof that if c =£ 0, then the general straight line y = tnx 
+ c in the z-plane is mapped by the transformation w = \\z onto a circle 
in the w-plane. 



PROBLEMS / 491 



10-48 Find the image in the w-plane of the circle \ z\ = 2 if 

2z + ;' 
w = -• 

Z — 51 

10-49 Show that w = e z maps the straight lines y = const in the z-plane onto 
straight lines through the origin in the tv-plane, and the straight lines x = 
const in the z-plane onto circles about the origin in the w-plane. 

10-50 Locate the critical points of w = sin z and show that it maps the region 
— Jw < x < Jtt, y > in the z-plane onto the upper-half of the H'-plane. 

Section 10-6 





y 

z - plane 






U 










k 





\* 




Problem 10-51 



10-51 Using the argument given in the text, show how the complex potential 
F(z) = Uoz and the mapping w = z 1 ' 2 may be used to find the streamlines 
indicated in the figure. 

Find the speed of flow at a point P with coordinates (wo, vo) and determine 
the streamline and the equipotential through P. 



Scalars, vectors, and 
fields 



11-1 Curves in space 

If the coordinates (x, y, z) of a point P in space are described by 

*=/('), y=g(t), z = h(t), (l 11) 

where/, g, h are continuous functions of t, then as t increases so the point P 
moves in space tracing out some curve. It follows that Eqns (11-1) represent 
a parametric description of a curve V in space and, furthermore, that they 
define a direction along the curve V corresponding to the direction in which 
P moves as 1 increases. For example, the parametric equations 

x = 2 cos 2rrt, y = 2 sin 277?, z = 2t, 

for < / < 1 describe one turn of a helix, as may be seen by noticing that 
the projection of the point P on the (x, j)-plane traces one revolution of the 
circle x 2 + y 2 = 4 as t increases from t = to t = 1, whilst the z-coordinate 
of P steadily increases from z = to z = 2. 

If we now denote by r the position vector OP of a point P on T relative 
to the origin O of our coordinate system, and introduce the triad of ortho- 
gonal unit vectors i, j, k used in Chapter 4, it follows that (Fig. 11-1) 

r=m + g(t)i + h(t)k. (11-2) 

Expressions of this form are called vector functions of one real variable, 
in which the dependence on the parameter / is often displayed concisely by 
writing r = r(f). The name vector function arises because r is certainly a 
vector and, as it depends on the real independent variable t, it must also be a 
function in the sense that to each / there corresponds a vector r(f). Knowledge 
of the vector function r(/) implies knowledge of the three scalar functions 
/, g, and h, and conversely. 

The geometrical analogy used here to interpret a general vector function 
r(r) is particularly valuable in dynamics where the point P(?) with position 
vector r(/) usually represents a moving particle, and the curve Y its trajectory 
in space. Under these conditions it is frequently most convenient if the 
parameter t is identified with the time, though in some circumstances identi- 
fication with the distance s to P measured along T from some fixed point on 
T is preferable. Useful though these geometrical and dynamical analogies are, 
we shall in the main use them only to help further our understanding of 
general vector functions. 



SEC 11-1 



CURVES IN SPACE / 493 



z = h(t) 




*=m 



Fig. 11-1 Vector function of one variable interpreted as a curve in space. 

The name vector function suggests, correctly, that it is possible to give 
satisfactory meanings to the terms limit, continuity, and derivative when 
applied to r(t). As in the ordinary calculus, the key concept is that of a limit. 
Intuitively the idea of a limit is clear: when we say u(?) tends to a limit v as 
/ -»■ to, we mean that when t is close to to, the vector function u(/) is in some 
sense close to the vector v. In what sense though can the two vectors u(t) 
and v be said to be close to one another? Ultimately, all that is necessary is 
to interpret this as meaning that | u(t) — v | is small. 

So, we shall say that u(/) tends to the limit v as t ->■ to if, by taking / 
sufficiently close to to, it is possible to make | u(/) — v | arbitrarily small. As 
with our previous notion of continuity we shall then say that u(t) is continuous 
at to if lim u(t) = v and, in addition, u(/o) = v. We incorporate these ideas 

t-*t 

into a formal definition as follows : 



definition 111 (vector functions — limits and continuity) Let u(t) = 
ui(t)i + «2(0j + «3(?)k and v = ni + ^2J + J>3k, then if for any e > there 
is some number d such that 

| u(0 — v | < e when \ t — t \ < d, 

we shall say that u(7) tends to the limit v as t —*■ to, and write 

lim u(f) = v. 



494 / SGALARS, VECTORS AND FIELDS CH 11 

If in addition u(/ ) = v, then u(Y) will be said to be continuous at / = to- A 
vector function that is continuous at all points in the interval a<t<b will 
be said to be continuous throughout that interval. 

As usual, a vector function that is not continuous at t = to will be said 
to be discontinuous. It is obvious from this definition that u(f) can only 
tend to the limit v as t — »• to if the limit of each component of u(f) is equal to 
the corresponding component of the vector v. Thus the limit of a vector 
function of one variable is directly related to the limits of the three scalar 
functions of one variable u\(t), m(t), and m{t). This is proved by writing 

I u(0 - v | = [(«i(0 - i7i)2 + (w a (0 - f; 2 ) 2 + («s(0 - i*) 2 ] 1 ' 8 , 

showing that | u(t) — v | < e as t — *■ to is only possible if 

lim (m(t) - v ( ) = for / =1,2, 3, 

<— to 

or 

lim ui(t) = vi, lim u^it) = v 2, lim uz(t) = vz. 

t-*tQ t-*t() t—*t() 

A systematic application of these arguments enables the following theorem 
to be proved. 

theorem 11-1 (continuous vector functions) If the vector functions 
u(0, v(0 are defined and continuous throughout the interval a<t<b, then 
the vector functions a(t) + y(t), u(t) x v(f), and the scalar function u(0 . \(t) 
are also defined and continuous throughout that same interval. 

Example 11-1 At what points are the vector functions u(t), \(t) discontinuous 
if 

u(f) = sin ti + sec t\ H k, 

v(f) = ti + (1 + t 2 )\ + e'k. 

Verify by direct calculation that u(/) + v(f), u(?) . v(0, and u(/) x v(t) are 
continuous functions in any interval not containing a point of discontinuity 
of u(?) or v(t). 

Solution The i component of u(f ) is defined and continuous for all t , whereas 
the j component is discontinuous for / = (2n + 1)^77 with n = 0, ±1, ±2, 
. . . and the k component is discontinuous for the single value / = 1. All 
three components of v(0 are continuous for all t. We have by vector addition 

u(0 + v(r) = (t + sin t)i + (1 + r 2 + sec t)\ + ( e* + ——r) K 



SEC 11-1 



CURVES IN SPACE / 495 



showing that the components of u(?) + v(f) give rise to the same points of 
discontinuity as the function u(t). We may thus conclude that the vector sum 
is continuous throughout any interval not containing one of these points. 
For example, u(0 + \(t) is continuous in both the open interval (\tt, 3tt/2) 
and the closed interval [5, 7] but it is discontinuous in (0, 77). 
The scalar product u(?) . v(f) is given by 

at 

u(r) . v(r) = t sin t + (1 + t 2 ) sec t + 



0-1) 



which is, of course, a scalar. Again we see by inspection that the scalar 
product is continuous in any interval not containing a point of discontinuity 
of u(r). 

The vector product u(f) x \(t) is 



u(r) x v(r) = 



* J k 

sin t sec t 1/0 — 1) 
t 1 + t 2 e* 



giving, 

u(r) x v(/) = I e ( sec / 



1 + fi 
t- 1 



+ 



t- 1 



— e f sin / j 



+ [(1 + t 2 )sint — ?sec?]k. 



Here also inspection of the components shows that the vector product is 
continuous in any interval not containing a point of discontinuity of u(r). 

The following definition (interpreted later) shows that, as might be ex- 
pected, the idea of a derivative can also be applied to vector functions of one 
variable. 

definition 11-2 (derivative of vector function) Let a(t) be a continuous 
vector function throughout some interval a < t < b at each point of which 
the limit 



lim 

A<— 



u(f + Ar) - u(Q 
At 



is defined. Then u(/) is said to be differentiate throughout that interval with 
the derivative 

du u(t + A/) — u(/) 

— = lim 

dt A (-o A? 



The geometrical interpretation of the derivative of a vector function of a 
real variable is apparent in Fig. 11-2. In that figure the curve T is described 



496 / SCALARS, VECTORS AND FIELDS 



CH 11 



u~M 



r'«+ A'W rf „ 



,j&~ 




o 

Fig. 11-2 Geometrical interpretation of du/dt. 



by a point P(0 with position vector u(t) relative to O. The point denoted by 
P'(t + At) is the position assumed by u at time t + At, so that OP = u(t), 
OP' = u(; + At), and PP' = Au is the increment in u(?) consequent upon 
the increment A? in t. 

It is obvious that as At -*■ 0, so the vector Au tends to the line of the 
tangent to the curve V at P(t) with Au being directed from P to P'. To inter- 
pret du/dt in terms of components when u(0 = ui(t)i + W2WJ + "3(0k, we 
need only observe that 



— = lim 

at a«— o 



= lim 

A(--0 



u(t + At) — u(0 
_ 

ui(t + AQ - ui(t) ' 
At 



i + lim 

A«->0 



~u 2 (t + At) - m(t)' 



J 



+ lim 

A«-»0 



At 

' u 3 (t + AQ - u 3 (t) ~ 
At 



from which it follows that 
du dwi . d«2 . d«3 

d7 = "d7 1 + "d7 J+ d7 



(11-3) 



The unit vector T that is tangent to T at P(?) and points in the direction in 
which P(0 will move with increasing t is obviously 



T = 



du 

dt 



du 

d7 



(11-4) 



If 5 is the distance to P measured positively in the sense P to P' along F 
from some fixed point on that curve (Fig. 11-2), then we know from our 
work with differentials that d«i = u\dt, d«2 = u'2df, duz = u'zdt. Now as 
the differentials d«i, d»2, d«3 are mutually orthogonal and represent the 
increments in the coordinates [ui(t), mit), U3(t)] of P to an adjacent point 
distant d* away along T with coordinates [ui(t + dt), m 2 (/ + dt), « 3 (f + dt)], 



SEC 11-1 



CURVES IN SPACE / 497 



we may apply Pythagoras' theorem to obtain 

(ds) 2 = OAd/) 2 + (u'odt) 2 + (u' 3 dt)\ 
whence 



ds 
dt 



VdwA 2 /dw 2 \ 2 /dw 3 \ 2 l 
\1t) + \~dt) + \d7J J' 



Comparison of Eqns (11-3) and (11-5) then gives the result 



du 

d7 



ds 
d7 



(11-5) 



(116) 



from which we see that if / is regarded as time, then the vector function 
v = du/d? is the velocity vector of P(?) as it moves with speed dsjdt along T 
in the direction of T. These results merit recording as a theorem. 

theorem 11-2 Let u(t) = ui(t)i + u 2 (t)'} + uz(t)k be a differentiable 
vector function of the real variable t, then 

du d«i . du2 d«3 

dt dt dt ' dt 

If T denotes the curve traced out by the point P(r) with position vector u(7) 
as t increases, and s is the distance to P(/) measured along Y from some fixed 
point, then 



d.? 
dt 



and the unit tangent T to the curve T at P(?) oriented in the sense of increasing 
t is 

du 



T = 



du\ 

dill 



dt 



As a consequence of this theorem we may write 



du ds /du\ 
d; ~ d; \dt) 



du 
d7 



= — T 

dt ' 



(11-7) 



which is a result of considerable use in dynamics when / is identified with 
time. 

Higher order derivatives such as d 2 u/d/ 2 and d 3 u/d/ 3 may also be defined 
in the obvious fashion as d 2 u/d? 2 = (d/dO(du/d?), d 3 u/d/ 3 = (d/d0(d 2 u/d? 2 ) 
provided only that the components of u(f) have suitable differentiability 
properties. Thus, for example, if the second derivatives of the components of 
u(0 exist we have 



d 2 u d 2 «i d 2 «2 d 2 «3 



dt 2 dt 2 



d/ 2 



d? 2 



(11-8) 



498 / SCALARS, VECTORS AND FIELDS CH 11 

We have seen that if t is identified with time and u(t) is the position vector 
of a point P, then du/dr is the velocity vector of P. It follows from this same 
argument that d 2 u/d? 2 is the acceleration vector of P. 

Example 11-2 The position vector r of a particle at time t is given by 

t = a cos mti + a sin cot) + oc? 2 k, 

where i, j, k have their usual meanings and a, to, and a are constants. Find 
the acceleration vector at time t, and deduce the times at which it will be 
perpendicular to the position vector. Hence deduce the unit tangent to the 
particle trajectory at these times. 

Solution By making the identifications u = r, m(t) = a cos cot, w 2 (0 = 
a sin cot and uz(t) = at 2 and then applying Theorem 11-2 we find that the 
velocity vector is 

dr 

— = — aco sin coti + am cos mti + 2<xtk. 

at 

A further differentiation yields the required acceleration vector 

d 2 r 

— — = — aco 2 cos coti — am 2 sin cot\ + 2ak. 

dt 2 J 

Expressed vectorially, the condition that r and d 2 r/df 2 should be perpendicular 
is simply that r . (d 2 r/d/ 2 ) = 0. Hence to find the time at which this condition 
is satisfied we must solve the equation 

(a cos coti + a sin mt\ + at 2 k) . (—am 2 cos mti — aco 2 sin cot] + 2ock) = 0. 
Forming the required scalar product gives 

— a 2 co 2 cos 2 mt — a 2 m 2 sin 2 mt + 2<x 2 t 2 = 
which immediately simplifies to 

a 2 u 2 = 2a 2 / 2 , 

showing that the desired times are 

aco 
a.\/2 

To deduce the unit tangent T at these times we use the fact that 

dr 



T 

where here 



-® 



dt 



SEC 11-1 CURVES IN SPACE / 499 



= V(« 2c ° 2 + 4t * 2 ' 2 )- 



Denoting by T±, the unit tangent to the trajectory at t = ±amJa\/2, we find 
by substitution of these values of / in the above expression that 



1 / . aw 2 . am 2 \ 



and 



1 / . aco 2 



aw* . aw 



2 



■ i + cos — — j — \/2 k 



t \/2 «V2 - 

With the obvious differentiability requirements, if u(f) and v(/) are differ- 
entiable vector functions with respect to t, then so also are u + v, u . v, 
u x v, and <f>u, where <f> = <f>(t) is a scalar function of t. As the following 
theorem is easily proved by resolution of the vector functions involved into 
component form, it is stated without proof. 

theorem 11-3 (differentiation, sums and products of vector functions) If 
u(f) and v(/) are differentiate vector functions throughout some interval 
a < t < b and cf>(t) is a differentiable scalar function throughout that same 
interval then, 

, , d , N du dv 

, ^ d t s dv du 

(c) d-r (u - v) = u 'd^ + d7- v; 

, .s d , . dv du 

(d)-(uxv) = ux-+-xv; 

and, if c is a constant vector, 

(e)ic = 0; 

where the order of the vector products on the right-hand side of (d) must be 
strictly observed. 

When considering the geometry of twisted curves in space it is convenient 
to identify points on a curve T by specifying their distance s measured along 
the curve itself from some fixed point. This is of course equivalent to identi- 
fying / with s in the position vector r(?) so that T is then defined as the locus 



500 / SCALARS, VECTORS AND FIELDS CH 11 

of the points having the equation r = r(s). This equation is called the intrinsic 
equation of the curve T. In terms of the intrinsic equation it follows from 
Eqn (11-7) that the unit tangent T to the curve T at r = r(s) is 

T = -- (11-9) 

Now although T is a vector function of s, it is also a unit vector, and so 
T . T = 1 . Differentiating this scalar product with respect to s by means of 
Theorem 11-3 (c) then gives 

dT dT n 

T + T .— =0 
as as 

or, as vectors in a scalar product commute, 

T ■ — =0. 

as 

Hence, provided dT/ds =£ 0, the derivative of the unit tangent T with respect 
to 5 is normal to T. Next, denoting by N the unit vector along dT/ds, 
we define the essentially positive scalar function k = k(s) by means of the 
equation 

dT 

— =/cN. (11-10) 

ds 

Here k is called the curvature of the curve at the point in question, and on 
account of the relationship between T and N, the vector N is called the 
principal unit normal to the curve Y at that point. As k is positive by definition 
and N is a unit vector it follows from Eqn (11-10) that 



dT 
ds 



(11-11) 



It is convenient to define a third and mutually orthogonal unit vector B 
called the unit binormal by means of the equation 

B = TxN. (11-12) 

The three unit vectors B, T, and N are, in general, all functions of s and they 
serve as a specially useful triad of mutually orthogonal unit reference vectors 
at points on the curve Y. It is important to appreciate that in general B, T, 
N, and k vary from point to point on the curve Y, being always defined in 
relation to the local properties of the curve in question. The positive number 
p = l//c defined at each point of the curve Y is called the radius of curvature 
of the curve at that point. 

Example 11-3 Find B, T, N, and the scalars k, p for the curve defined 



SEC 11-1 CURVES IN SPACE / 501 

parametrically in terms of / by the expression 

r = 2 cos (/ + /n)i — 2 sin (t + //)j + 4tk, 

where /u is a constant. Hence deduce the values of these quantities at the point 
on the curve corresponding to t = 0. 

Solution First notice that t is not the arc length s along the curve, because 
were this the case then it would follow that ds/dt = 1, whereas from Eqn 
(11-5) we have 

ds 

— = V[4 cos 2 (r + /*) + 4 sin 2 (/ + /*)+ 16] = 2^5. 

Now, using Eqn (11 -9) we have 
dr dr dt 



T = 

ds dt ds 



-o/a 



whence 

T 
Thus 



2V5 \dt) 



1 d 

T = 2V5 d? (2 C0S (t + ^ ~ 2 Sin ( ' + ^ + 4tk) 

(-2 sin (f + fi)i — 2 cos (f + ^)j + 4k), 



2V5 
and so 

T = ^- (sin (f + /u)i + cos (t + /j)\ - 2k). 

Next, to find N and k we write Eqn (11-10) as 

~ ds ~ dt 'ds~ \dt)l\di)' 



Hence 



= _J_ d_ / -sin (t + ju)i - cos (t + n)\ + 2k \ 
2V5drl V5 J 



_ 1 / —cos (t + fi)i + sin (t + p)j \ 
Using «: = | dT/ds |, it then follows that 



502 / SCALARS, VECTORS AND FIELDS CH 11 



-cos (; + [i)i + sin (t + /u)\ 



l_ 
10 



10 

and, consequently, that 

N = —cos (t + n)i + sin (t + /u)\. 

Since the radius of curvature p is defined by the relationship p = \Jk, we have 
p = 10. 

Finally, using the definition B = T x N gives 

■2k) x 

X (-cos (t + fi)i + sin (t + ,u)j), 



B = — (sin (t + fj)i + cos (t + [£)\ — 2k) x 



whence 

B = (2 sin (t + n)\ + 2 cos (/ + fi)j + k). 

The point on the curve corresponding to t = is r(0) = 2 cos //i — 2 sin /*j, 
and at this point: 

T(0) = — (sin /x\ + cos fi\ — 2k), 

VJ 

N(0) = — cos jA + sin fi\, 

B(0) = — (2 sin fii + 2 cos /*j + k). 

The curvature k = 1/10 is independent of t, and so k is the same for all 
points on the curve, as is the radius of curvature p = \Jk = 10. 

Thus far we have defined the triad of unit vectors B, T, and N which serve 
as a moving set of reference vectors along the curve Y. We have also cal- 
culated the derivative dT/ds, and to complete our examination of these 
vectors it only remains for us to find dB/ds and dN/ds. For our starting point 
we take Eqn (11-12), which we differentiate with respect to s, using Theorem 
11-3 (d), to obtain 

dB dT _ , dN 

— = — xN + Tx — 
as as as 

which, on account of Eqn (11-10), reduces to 
dB dN 

ds ds 

Next, forming the vector product of this equation with N and expanding the 



SEC 11-1 CURVES IN SPACE / 503 

resulting triple vector product on the right-hand side gives 
dB / dN\ dN 

However as N is a unit vector it follows, as in the derivation of Eqn (11-10), 
that N . (dN/ds) = 0, whilst the orthogonality of N and T implies that 
N . T = 0. Thus, 

dB 

N x — = 0, 

as 

and hence the vectors N and dBjds must be parallel, differing only by a scalar 
factor. This scalar factor is usually a function of 5 and it is called the torsion 
of the curve T. Torsion is conventionally denoted by — t, so we can write 

dB 

— = -tN. (11-13) 
ds 

If required, the torsion t may be calculated by using the obvious result 

dB 

T =-N-— • (11-14) 

as 

See Problems 11-16 to 11-18 for an alternative treatment of the calculation 
of p and t. 

The manner of construction of B, T, and N is such that they form a right- 
handed set in this order and, consequently, 

B = TxN, T = NxB, N = BxT. (11-15) 

This relationship is indicated in Fig. 1 1 -3 for a point P on the curve T. 

To find dN/ds we differentiate the last result of Eqn (11-15) with respect 
to s, and use Eqns (11-10), (11-13) together with the other results of Eqn 
(11-15) to obtain, 

dN dB ^ dT 

— = — XT + Bx— = -tN X T + /cB X N, 
ds ds ds 

whence 

dN 

— =rB-*T. (11-16) 

The study of the geometrical properties of space curves using the calculus 
techniques is called the differential geometry of curves, and it has as its basis 
the three equations 

dT T dB dN 

-dS = K ™> d7=" TN ' Ts =tB - kT > (1H7 > 



504 / SCALARS, VECTORS AND FIELDS 



CH 11 




Fig. 11-3 Moving triad of reference vectors. 

which are called the Serret-Frenet equations. Naturally, similar ideas lead 
to the differential geometry of surfaces, though we shall make no further 
use of such ideas in this first account of the subject. 

Example 11-4 Find the torsion of the circular helix of Example 11-3. 

Solution In the previous example it was shown that dsjdt = 1/(2 \/5) and 
N = — cos (t + /x)i + sin {t + (i)\, 

B = (2 sin (/ + (i)i + 2 cos (t + fi)\ + k). 

Hence, 



dB /dB\ l/ds\ 1 , . , 

d7 = (d7)/(d7J = 5 (cos( ' + ^ ), - sin(/ + ^ ) - 



An application of Eqn (11 14) gives 

t = — J [- cos (t + ,a)i + sin (t + fj,)\] . [cos {t + fi)i — sin (t + /j)\] = ^. 

This result might have been anticipated, for the circular helix in question is 
similar to a screw thread with a constant pitch, and consequently its curvature 
and twist properties must be the same at all points. 

11-2 Antiderivatives and integrals of vector functions 

The notion of an antiderivative, already encountered in Chapter 8, extends 



SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 505 

naturally to a vector function of a real variable. 

definition 11-3 (antiderivative — vector function) The vector function 
F(0 of the real variable t will be said to be the antiderivative of the vector 
function f(<) if 

Naturally, an antiderivative F(t) is indeterminate so far as an additive 
arbitrary constant vector C is concerned, because by Theorem 11-3 (e), 
dC/dt = 0. Continuing the convention adopted in Chapter 8, the operation 
of antidifferentiation with respect to a vector function of the single real 
variable t will be denoted by J, so that 

Jf(/)df.= F(/) + C, (11-18) 

where C is an arbitrary constant vector. 

It is obvious that Eqn (11-18), when taken in conjunction with Theorem 
11-2, implies the following result. 

theorem 11-4 (antiderivative of vector function) If 

ff(0d/ = F(0 + C, 

where f(/) = /i(f)i +/ 2 (0j + fa(t)k, F(f) = Fx{t)i + F 2 (t)\ + F 3 (t)k and C 
= Cii + C2J + C3I1 is an arbitrary constant vector, then 

SMQdt = Fi(t) + C it i = 1, 2, 3 

with 

IT =m - 

Expressed in words, the antiderivative of f(/) has components equal to the 
antiderivatives of the components of f(t). As with the scalar case, in many 
books the entire right-hand side of Eqn (11-18) is loosely referred to as the 
indefinite integral of the vector function f(0, rather than as here using this 
term to refer only to its first member. 

Example 11-5 Find the antiderivative of f(?) given that 

f(0 = cos ti + (1 + * 2 )j + e-«k. 
Solution It follows immediately from Theorem 1 1 -4 that, 

J f(t)dt = i J cos / dt + j J (1 + t z )dt + k J e-« dt 



sin ti + 1 1 + - j j - e- ( k + C. 



506 / SCALARS, VECTORS AND FIELDS CH 11 

The obvious modification to Theorem 11-4 to enable us to work with 
definite integrals of vector functions of a single real variable comprises the 
next theorem. Because it is strictly analogous to the scalar case it is offered 
without proof. 

theorem 11-5 (definite integral of vector function) If F(/) is an anti- 
derivative of f(t), then 






■b 

f{t)dt = F(b) - F(a). 



r 



Example 11-6 Evaluate the definite integral 

'It 

(t 2 \ + sec 2 t\ + k)df. 

Jo 

Solution From Theorem 11-5 we have the result 
(t 2 i + sec 2 t\ + k)dt = ( '- i + tan ?j + k?| 



= T^i + J + i-k. 



A slightly more interesting application of a definite integral is provided 
by the following example concerning the motion of a particle in space. 

Example 11-7 A point moving in space has acceleration 

sin 2ti — cos 2tk. 

Find the equation of its path if it passes through the point with position 
vector ro = j + 2k with velocity 2j at time t = 0. 

Solution If r is the general position vector of the point at time t, then the 
velocity v(t) = di/dt and the acceleration a(f) = d 2 r/d/ 2 . Hence 

d 2 r 

— = sin 2ti — cos 2/k, 

at* 

so that integrating the acceleration equation from to t and replacing t in the 
integrand by the dummy variable t gives 



(jl) dT= (sin 2ri — cos 2Tk)dT 



Hence 



( 



dr) 



= — £(cos 2ri + sin 2rk) 



o 



and so 



SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 507 



v(0 = vo+ 1(1 - cos 2t)i ~ I sin 2tk. 

Now from the initial conditions of the problem vo = 2j, so that the velocity 
equation becomes 

v(0 = 1(1 - cos 2t)\ + 2j — \ sin 2/k. 

To find the equation of the path a further integration is required so, setting 
v(/) = dr/df, integrating the velocity equation from to / gives 



|_£ j dr = j (i(l - cos 2r)i + 2j - \ sin 2rk)dr. 



Hence 



= iO - \ sin 2r)i + 2rj + \ cos 2rk) 








r(r) 

and so 

r(/) = r + \{t - \ sin 2/)i + 2/j + J(cos 2/ - l)k. 

Again appealing to the initial conditions of the problem we find that ro = 
j + 2k, so that, finally, the particle path must be 

r(0 = 1(7 - \ sin 2t)\ + (1 + 2t)\ + \{1 + cos 2f)k. 

The form of definite integral of a vector function so far considered is 
itself a vector. We now discuss one final generalization of the notion of a 
definite integral involving a vector function that generates a scalar. 

Let a curve T denned parametrically in terms of the arc length s have the 
general position vector r = i(s) and unit tangent vector T(s), and let F(s) 
be a vector function of s. Then at any point of T the scalar function <f>(s) = 
F(s) . T(s) represents the component of F(s) tangential to T. If the scalar 
function <f>(s) is then integrated from s = a to s = b, this is obviously equiva- 
lent to integrating the tangential component of F(s) along T from the point 
r = r(a) to the point r = r(b). An integral of this form is therefore called 
either a line integral or a curvilinear integral of the vector function F(s) taken 
along the curve T, which is sometimes referred to as the path of integration. 

definition 1 1 -4 (line integral of vector function) The line integral of the 
vector function F(s) taken along the curve T between the points A and B 
with position vectors r = r(a) and r = t{b), respectively, is the quantity 

rb /»« 

J= </>(s)ds= F.Tds, 

Ja Ja 

where <f>(s) = F(s) . T(s), s denotes arc length along T, and T(j) is the unit 
tangent vector to I\ 



508 / SCALARS, VECTORS AND FIELDS 



CH 11 



In terms of the general position vector r of a point on the curve and the 
fact that s is the arc length along V, we obviously have the relationship 
dr = T ds, so that the line integral may also be written 



-r 



F.dr 



or, more simply still if T denotes part of a curve, as 

/-JY*. 

In component form, setting the differential dr = dxi + dy\ + dzk and 
F = Fii + Fz\ + F3IC, we have at once 



j F . dr = f Fi dx + F 2 dy + F 3 dz. 



(11-19) 



If desired, the line integral (11 19) may be defined vectorially in terms of 
the limit of a sum in a manner strictly analogous to the definition of an 
ordinary definite integral. To achieve this, let the interval a < s < b be 
divided into n sub-intervals Si-i<s<su with i = 1, 2, . . ., n, where 
so = a and s n = b. Then setting dr* = r(s e ) — r(^-i) as in Fig. 11-4, the line 




Fig. 11-4 Line integral of F along F. 



SEC 11-3 SOME APPLICATIONS / 509 

integral (11-19) may be approximated by the sum 

J n = 2 F(j<) . diy. (11-20) 

If the number of sub-divisions n is now allowed to tend to infinity in such a 
manner that the lengths of all the sub-divisions tend to zero then, as with an 
ordinary definite integral, we arrive at the result 

'II n 

F.dr = lim £ F(^) • dr«. (11-21) 



r 



n— »oo i=l 



When used in this context, the differential dr« is usually called a line element 
of the curve T joining A to B. 

Example 11-8 Evaluate the line integral 
F.dr, 



I 



given that F = yz\ + xz] + 2xyk and T is that part of the circular helix 
x = a cos t, y = a sin t, z = kt that corresponds to the interval < t < 2tt. 

Solution First we use Eqn (11-19) to write the line integral as 

F . dr = yz dx + 2xz dy + xy dz. 

Now along the path V we have the relationships 

x = a cos t, y = a sin t, z = kt 
which imply the differential relationships 

dx = —a sin t dt, dy = a cos t dt, dz = k dt. 
Hence 



/-'•*-!! 



2tt 

(—a 2 kt sin 2 / + 2a 2 kt cos 2 1 + a 2 sin / cos t)dt 



„, T* 2 /sin 2* cos2f| 2 " 





+ 

= aWk. 



t 2 t sin 2t 
+ 



4 4 



cos 2f 



■-|2l7 



cos 2/ 



11-3 Some applications 

Kinematics, an important branch of mechanics, is essentially concerned with 
the geometrical aspect of the motion of particles along curves. Of particular 



510 / SCALARS, VECTORS AND FIELDS 



CH 11 




Fig. 11-5 Planar motion of particle in terms of polar coordinates. 



importance is that class of motions that occur entirely in one plane, and so 
are called planar motions. In many of these situations, for example, particle 
motion in an orbit, the position of a particle is best denned in terms of the 
polar coordinates (r, d) in the plane of the motion. Let us then determine 
expressions for the velocity and acceleration of a particle in terms of polar 
coordinates. 

We first appeal to Fig. 11-5, which represents a particle P moving in the 
indicated direction along the curve I\ The unit vectors R, are normal to 
each other and are such that R is directed from O to P along the radius 
vector OP, and points in the direction of increasing 6. Then clearly R and 
are vector functions of the single variable d, with 



R = cos 0i + sin 6\ 



and 



= — sin 0i + cos 0j. 



It follows from these relationships that 



dR 

d0 



= 



and 



d0 



= -R. 



(11-22) 



(11-23) 



In terms of the unit vectors R, the point P has the position vector 

r = /-R, (11-24) 

so that the velocity drjdt must be 

dr dr dR 

dt dt dt 

dr dR dd 

-dt* + r Td~d? 

showing that the velocity vector of P is 

T = rR + r60, (11-25) 



SEC 11-3 SOME APPLICATIONS / 511 

where differentiation with respect to time has been denoted by a dot. 

Here the quantity f is called the radial component of velocity and r& is 
called the transverse component of velocity. A further differentiation with 
respect to time yields for the acceleration vector f = d 2 r/df 2 the expression 

r = FR + rR + rOQ + rdQ + rtiQ 

or 

t = rR + f6^ + (rd + r0)0 + r6 2 ~ 
do do 

Hence by Eqn ( 1 1 -23) this is seen to be equivalent to 

r = (r - r0 2 )R + (2r6 + rd)Q. (1 1-26) 

The quantity f — rd 2 is called the radial component of acceleration, and 
2rfi + rd is called the transverse component of acceleration. 

Example 11-9 A particle is constrained to move with constant speed v 
along the cardioid r = a(l + cos 6). Prove that 

v = 2a$cos (-)» 

and show that the radial component of the acceleration is constant. 

Solution From Eqn (11-25) and the expression r = a{\ + cos 6), it follows 
that the velocity vector r is given by 

r= -a sin 6&R + o(l + cos 0)00. 

Now as v 2 = i -2 = r . r, we have 

v 2 = a 2 & 2 sin 2 6 + a 2 6 2 (l + cos 0) 2 = 2a 2 6 2 (l + cos 0). 

Using the identity 1 + cos = 2 cos 2 (0/2) in this expression and taking the 
square root yields the required result 

v = 2a0 cos (0/2). 

To complete the problem we now make appeal to the fact that the radial 
acceleration component is f — r6 2 , whilst by supposition v = constant. 
From our previous working we know that 

v 2 = 2a 2 2 (l + cos 0), 

so that differentiating with respect to t and cancelling 6 gives 

.. _ 2 sin 
~ 2(1 + cos 0) 



512 / SCALARS, VECTORS AND FIELDS CH 11 



or, 


as 








6 2 -. 


V 2 






2a 2 (l + cos 


e) 




= 


v 2 sin 






4a 2 (l + cos 


0)2 



Hence as f = —a(cos 00 2 + sin 66), substituting for r, & 2 , and 6 in the radial 
component of acceleration we find, as required, that 

— 3v 2 
r — r& 2 = — - — = constant. 
4a 

A vector treatment of particle dynamics follows quite naturally from the 
ideas presented so far. Thus a particle of variable mass m moving with velocity 
v has, by definition, the linear momentum M, where 

M = my. 

Now by Newton's second law of motion we know that, with a suitable choice 
of units, we may equate the force F to the rate of change of momentum, so 
it follows that we may write 

., dM 

However, 

dM dm dv 

— - = — v + m — 

At At At 

and hence 

Am dv 

F = -y + m-. (11-27) 

In the case of a particle of constant mass m, we have Am/At = 0, reducing 
Eqn (11-27) to the familiar equation of motion 

F = wa, (11-28) 

where a = Av/At is the acceleration. 

Similarly, the angular momentum of a particle of fixed m»ss m about the 
origin is defined by the relation SI = r x my, where r is the position vector 
of the particle relative to the origin and v == At/At is its velocity. Then the 
rate of change of angular momentum about the origin is 

ASl dv 

— — - = my X v + mr x — 

At At 

= rxF, (11-29) 



SEC 11-3 SOME APPLICATIONS / 513 

by virtue of Eqn (11-28). This is the vector form of the principle of angular 
momentum, which asserts that the rate of change of angular momentum 
about the origin is equal to the moment about the origin of the force acting 
on the particle. 

The line integral 

■'-//•* 

also occurs naturally in many contexts, perhaps the simplest of which is in 
connection with the work done by a force. If F is identified with a force, and 
dr is a displacement along some specific curve T joining points A and B, then 
/ represents the work done by the varying force F as it moves its point of 
application along the curve V from A to B (cf. Fig. 1 1 -4). 

In the special case that F is a constant force and T is a straight line segment 
with end points at s = a and s = b this simplifies to an already familiar 
result. Suppose that F = F a and dr = dsp, where a, (3 are constant unit 
vectors inclined at an angle 6, then 

J = F . dr = F(tt . p) ds 

J.i Ja 

= F(b — a) cos 6. 

Thus, as would be expected in these circumstances, the work done by F is 
the product of the component F cos 6 of the force F along the line of motion 
and the total displacement (b — a). 

The line integral also occurs in fields other than particle dynamics, and in 
fluid mechanics for example, if F is identified with the fluid velocity v and T 
is some closed curve drawn in the fluid, then the scalar quantity y defined by 
the line integral 



y 



-jl..* 



is called the circulation around the curve T. In more advanced works it is 
shown that y provides a measure of the degree of rotational motion present 
in a fluid. For a special class of fluid flows known as potential flows the cir- 
culation is everywhere zero, irrespective of the choice of T. These flows are 
said to be irrotational and are of fundamental importance. Line integrals 
around closed curves are generally denoted by the symbol § with the conven- 
tion that the path of integration is taken anti-clockwise, so that for the 
circulation y we would write 



y = (b v . dr. 

A reversal of the direction of integration around T would change the sign 
of y. 



514 / SCALARS, VECTORS AND FIELDS 



CH 11 



An exactly similar application of the line integral occurs in electromagnetic 
theory, where the electromotive force (e.m.f.) between the ends A and B of a 
wire coinciding with a curve T is related to the electric field vector E by the 
line integral 



e.m.f. 



=r 



E.dr. 



Example 11-10 Find the work done by a force F = yz\ + xj + xzk in 
moving its point of application along the curve Y defined by x = t, y = t 2 , 
z = fi from the point with parameter t = 1 to the point with parameter 
t = 2. 

Solution 

Work done = F . dr = (yzi + xj + xzk) . (dxi + dyj + dzk) 



1 



= yz dx + x dy + xz dz. 

Now as x = t, y = t 2 , z = t 3 , it follows that 
dx = dt, dy = It dt, dz = 3t 2 dt 
and so, substituting in the above expression, we find 

Work done = | (4? 5 + 2t 2 )dt = 140/3 units. 

Example 11-11 If the fluid velocity v = x 2 y\, determine the circulation 
y of v around the contour Y comprising the boundary of the rectangle 
x = ±a, y = ±b. 











^y 






R 


r 


b 


A 


o 


-< 


^^^B 




1* 


—a 






O 






fl* 


S 






F 
-b 






p 



SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 515 

Solution By definition, the circulation y is 

y = (j> v . dr = o x 2 y\ . (dxi + dvj + dzk) 

Jr Jr 

= (p x 2 ^ dx, 



where the direction of integration is anti-clockwise around T. Now the line 
integral around T may be represented as the sum of four integrals as follows, 

rQ rR rs rp 

y = x 2 y dx + x 2 y dx + x 2 y dx + \ x 2 y dx, 

where the limits refer to the corners of the rectangle in Fig. 11-6. 

The first and third integrals vanish since x is constant along PQ and RS, 
with the consequence that dx = 0. Along QR, y = b and along SP y = —b, 
so that 



J" -a fa 

bx 2 dx + —bx 2 dx = 
a J — a 



-4a 3 b 



11-4 Fields, gradient, and directional derivative 

The scalar function <f> = VO - x 2 ) + V(l - J 2 ) + V(l - z 2 ) is defined 
within and on the cube shaped domain |x|<l, | / | < 1, | z | <; 1 and 
assigns a specific number <£ to every point within that region. In the language 
of vector analysis, <f> is said to define a scalar field throughout the cube. In 
general, any scalar function <£ of position will define a scalar field within its 
domain of definition. A typical physical example of a scalar field is provided 
by the temperature at each point of a body. 

Similarly, if F is a vector function of position, we say that F defines a 
vector field throughout its domain of definition in the sense that it assigns a 
specific vector to each point. Thus the vector function F = sin xi + xyj + 
je z k defines a vector field throughout all space. 

As heat flows in the direction of decreasing temperature, it follows that 
associated with the scalar temperature field within a body there must also be 
a vector field which assigns to each point a vector describing the direction and 
maximum rate of flow of heat. Other physical examples of vector fields are 
provided by the velocity field v throughout a fluid, and the magnetic field H 
throughout a region. 

To examine more closely the nature of a scalar field, and to see one way 
in which a special type of vector field arises, we must now define what is 
called the gradient of a scalar function. This is a vector differentiation 
operation that associates a vector field with every continuously differentiable 
scalar function. 



516 / SCALARS, VECTORS AND FIELDS CH 11 



definition 11-5 (gradient of scalar function) If the scalar function 
<j>(x, y, z) is a continuously differentiable function with respect to the inde- 
pendent variables x, y, and z then the gradient of <^, written grad <f>, is defined 
to be the vector 

, . 8<h . 8<J> 8J> 
grad<£ = -ri + -rj + -^k. 

ox 8y 8z 

For the moment let it be understood that r = xi + y\ + zk is a specific 
point, and consider a displacement from it dr = dxi + dy\ + dzk. Then it 

follows from the definition of grad <j> that 

8<t> 8<h 8d> 

dr . grad <£ = / dx + -£■ dy + ■£■ dz, 

ox oy 8z 

in which it is supposed that grad <f> is evaluated at r = xi + y\ + zk. Theorem 
5T9 then asserts that the right-hand side of this expression is simply the total 
differential d<f> of the scalar function <f>, so that we have the result 

d<£ = dr . grad <£. (11-27) 

If we set ds = | dr |, then dr/ds is the unit vector in the direction of dr. 
Writing a = dr/ds, Eqn (11-27) is thus seen to be equivalent to 

^ = a.gradf " (11-28) 

ds 

Because a . grad <f> is the projection of grad <f> along the unit vector a, expres- 
sion (11-28) is called the directional derivative of <f> in the direction of a. 

In other words, a . grad <f> is the rate of change of <f> with respect to distance 
measured in the direction of a. We have already utilized the notion of a direc- 
tional derivative in connection with the derivation of the Cauchy-Riemann 
equations, though at that time neither the term nor vector notation was 
employed. 

As the largest value of the projection a . grad <j> at a point occurs when a 
is taken in the same direction as grad <f>, it follows that grad (f> points in the 
direction in which the maximum change of the directional derivative of <f> 
occurs. 

In more advanced treatments of the gradient operator it is this last 
property that is used to define grad <j>, since it is essentially independent of 
the coordinate system that is utilized. From this more general point of view 
our Definition 11-5 then becomes the interpretation of grad^ in terms of 
rectangular Cartesian coordinates. 

The vector differential operator V, pronounced either 'del' or 'nabla', is 
defined in terms of rectangular Cartesian coordinates as 

8 d d 

8x 8y oz 



SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 517 

As the name implies, V is a vector differential operator, not a vector. It 
only generates a vector when it acts on a suitably differentiable scalar func- 
tion. We have the obvious result that 

J , 86 . 86 . 86 1.8 8 8\ , 

grad ^ s _ 1+ _ J + _| k= ^_ +j _ + k _^ sV ^ (U . 30) 

Example 11-12 Determine grad 6 if 6 = z 2 cos (xy — Jtt), and hence 
deduce its value at the point (1, \-n , 1). 

Solution We have 



dx -yz* sin (xy - \rr), — = -xz 2 sin (xy - \tt) 



and 



8z 
Hence 



- = 2z cos (xy - Jtt). 



<3<i d<£ d<£ 

= -jz 2 sin (xy - \n)\ — xz 2 sin (xy - Jtt)]' + 2z cos (xy - £ir)k. 
At the point (l, \tt, l) we thus have 

(grad 6) a< K „ = — (_(| CT )i - j + 2k). 

Example 11-13 If r = xl + y\ + zk, and r = \ r |, deduce the form taken 
by grad r n . 

Solution As r = (x 2 + j 2 + z 2 ) 1 / 2 , it follows from Eqn (11-30) and the 
chain rule that 



pad,-- (,-+,- +k-)r- 

(.dr 8 8r 8 dr 8\ 

\ 8x dr 8y dr 8z dr) 

, (dr 8r dr \ 
-^fe' + ^ + SV 



However, 

dr x dr y dr z 

dx r 8y r dz r 



518 / SCALARS, VECTORS AND FIELDS CH 11 

and so 

grad r n = nr n ~ 2 (xi + y] + zk) = nr n ~ 2 T. 

The following theorem is an immediate consequence of the definition of 
the gradient operator and of the operation of partial differentiation. 

theorem 11-6 (properties of gradient operator) If <f> and tp are two con- 
tinuously differentiable scalar functions in some domain D, and a, b are 
scalar constants, then 

(a) grad a = 0; 

(b) grad {a<j> + tip) = a grad <j> 4- b grad rp; 

(c) grad (<f> xp) = <f> grad y> + y> grad <f>. 

The surfaces <f>(x, y, z) = constant associated with a scalar function <f> are 
called level surfaces of<f>. If we form the total differential of j> at a point on a 
specific level surface <f> = constant then &<f> = and, as in Eqn (5-23), we 
obtain the result 

d ± Ax + d A dy + d ± Az = o. 

ox oy dz 

This is equivalent to 

dr. grad ^ = 0, (11-31) 

where now dr is constrained to lie in the level surface. 

This vector condition shows that grad <j> must be normal to dr, and as dr 
is constrained to be an arbitrary tangential vector to the level surface at the 
point in question, it follows that the vector grad </> must be normal to the level 
surface. The unit normal n to the surface is thus n = grad^/| grad<£ |. 
Notice that this normal is unique apart from its sign. This simple argument 
has proved the following general result. 

theorem 11-7 (normal to level surface) If <j> is a continuously differentiable 
scalar function, the unit normal n to any point of the level surface <f> = con- 
stant is determined by 

grad <£ 
n = 



grad<£ 



Example 11-14 If <f> = x 2 + 3xy 2 -f- yz z — \2, find the unit normal n to 
the level curve <p = 3 at the point (1, 2, 1). Deduce the equation of the 
tangent plane to the level surface at this point. 

Solution The level surface <f> = 3 is defined by the equation %p = 0, where 
y> = x 2 + 3xy 2 + yz 3 — 15 = 0. 



SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 519 

Hence 

grad y> = (2x + 3/ 2 )i + (6xy + z 3 )j + 3yz 2 k 
which, at (1, 2, 1), becomes 

(grad v)(i,2,i) = 14i + 13j + 6k. 

As xp = is the desired level surface, it follows from Theorem 11-7 that the 
unit normal to this surface at the point (1, 2, 1) must be, 

141 + 13j + 6k 141 + 13j + 6k 



n = 



V[(14) 2 + (13)2 + (Q2] ^401 

Now the equation of a plane is n . r = p, where r = xi + y\ + zk is a 
general point on the plane, n is the unit normal to the plane, and p is its 
perpendicular distance from the origin. The point ro = i + 2j + k is a point 
on the plane so that n . r = n . ro (=/?). Hence 

/14i + 13j + 6k\ /14i+13j + 6k\ ,. „. 1N 

showing that the required equation is 
14* + 13j + 6z = 46. 

We have seen how the gradient operator associates a vector field grad <f> 
with every continuously differentiable scalar field <f>. Any vector field F = 
grad <£ which is expressible as the gradient of a scalar field <f> is called a 
conservative vector field, and <f> is then referred to as the scalar potential 
associated with the vector field. 

This has an important implication when line integrals involving con- 
servative vector fields are considered. Let us suppose that F = grad <j>, and 
that 



-L 



JA 

Then 



B 

F.dr. (11-32) 



J= J grad^.dr, (11-33) 

and by virtue of Eqn (11-27) this can be written 

J = f d<t> = «£(B) - #A). (11-34) 

Hence when F belongs to a conservative vector field, results (1 1 -32) and (11-34) 
show that the line integral J of F depends only on the end points of the path 
of integration, and not on the path itself. 

This fundamental result has far reaching consequences and forms the 



520 / SCALARS, VECTORS AND FIELDS CH 11 

basis of many important developments, of which gravitational potential 
theory is but one. Suppose, for example, that F is identified with a conserva- 
tive force field, then result (11-34) represents the change in the potential 
energy of a particle as it moves from point A to point B. That /depends only 
on the difference <f>(B) and <£(A) and not on the path joining A to B explains 
why, when using potential energy considerations in mechanics, no considera- 
tion need be given to the path that is followed. 



11-5 An application to fluid mechanics 

Let the velocity field v in a fluid as a function of position (x, y, z) and the 
time t be denoted by 

V = Vj\ + V2] + v$l, (11-35) 

where v t = v ( (x,y, z, t) for / = 1, 2, 3. Clearly, if at any fixed time t = h, 
dr denotes a differential displacement along the line of flow at the point with 
position vector r = r(x, y, z, h) then dr must be parallel to the velocity 
vector v at that point. Hence the respective components of dr and v must be 
proportional. The lines determined in this manner, which are everywhere 
tangential to the velocity field vector, are called the streamlines of the flow 
field. More properly these should be called stream surfaces since in three 
space dimensions they correspond to surfaces. In the case of a general vector 
field F, not necessarily defining a velocity field, they are called field lines. 
The condition that dr must be parallel to v implies that the field lines or 
streamlines must satisfy the equations 

^f = ^ = ^. (11-36) 

VI V2 Vz 

Equations of this form are called differential equations, and methods for 
their solution will be explored systematically in the last three chapters. 

If, now, r is the position vector of a fluid particle at time t, we have the 
obvious vector equation 

^ = v, (11-37) 

dt 

which implies the three scalar differential equations 

dx dy dz . . . 

T- = t, i» -£ = v *> -r t =V3 - (11-38) 

dt dt dt 

Together, the solutions of these last three equations define curves called the 
particle trajectories. The particle trajectories are functions of the time, and 
are so named because they describe the path followed by individual fluid 
particles, as they pass through the flow field. 



SEC 1 1-5 AN APPLICATION TO FLUID MECHANICS / 521 

Example 11-15 Find the differential equations of the streamlines and 
particle trajectories corresponding to the fluid velocity field v = 2t 2 xi + 
ty\ + 3z 2 k. 



Solution In this case vi = 2fix, v 2 = ty, and v 3 = 3z 2 , showing that the 
differential equations describing the streamlines are 

dx dy dz 
2^ = 7y" = 3z" 2 (Wth ' = constant )> 

whilst the differential equations describing the particle trajectories are 
dx dy dz 

5F- 2 "*' 37"* a?- 3A 

In this case all of these differential equations are of the type called variables 
separable which may be solved by direct integration after some slight re- 
arrangement. Although the main discussion of the solution of differential 
equations will be postponed until the final chapters of this book, it will be 
instructive to solve the ones that have arisen in connection with this problem. 

The differential equations defining the streamlines are equivalent to two 
different relationships between the three space variables x, y, and z. We 
choose to work with the first and last pairs of the equations which are, 
respectively, equivalent to 

dx o, d y a A y * dz 

— = 2t— and — = , 

x y y 3z 2 

with t regarded as a constant. Taking the antiderivatives of these gives 

log x = 2?{log y + constant} and log y = — + constant. 

3z 

Re-arrangement shows that the streamlines or stream surfaces are described 
by the equations 

x = (Ciy) 2 ', y = eWSe-'/fc 

where d, C 2 are arbitrary constants. If flow in the plane z = z is considered, 
then these equations define a curve that is correctly called a streamline. It 
would be the curve 

x = Ci 2 *e 2C,2(2/3 e _2(2/3zo , y = e^V" 3 * . 

The particle trajectories are found in similar fashion by finding the 
antiderivatives of 

d * -. o . dy dz 

— =2fidt, -i + /d/, -«3dr. 

x y z 2 



522 '/ SCALARS, VECTORS AND FIELDS CH 11 

Hence 

2 t 2 

log x = - t 3 + constant, log/ = — + constant, 

1 

— 3t + constant, 

z 
showing that 

x = c 3 e 2(3 ' 3 , 7 = C^\ z = ^-^ 

where Cz, d, and C5 are arbitrary constants. The position vector of a 
particle must thus be 

r=C 3 e 2(3 ' 3 i + C 4 e (2 / 2 j + — 5— -k. 

C5 — it 

PROBLEMS 

Section 111 

11-1 Sketch and give a brief description of the curves described by the following 
vector functions of a single real variable t : 

(a) r = a cos 2nti + b sin 2nt] + tk; 

(b) r = a cos 2w/i + b sin 2rtj + t 2 k; 

(c) r = d + t 2 ) + t 3 k. 

11-2 State which of the following vector functions are everywhere continuous and, 
if they have points of discontinuity, where these occur: 

( a > u w=(rr^) i+ (r^)j + ' 2k; 
(b) U (o = (7-Z--2) * + tan 'i + ?e ~' k ; 

(c) u(f) = tanh M + cosh t\ + t sinh tk; 
(d)u(0= (^-L^-X i + I sin r| j + 3k. 

11-3 A vector function u(f) of a real variable t may be assigned left- and right-hand 
limits uOo— ) = Hm u(/) and u(/o+) = lim u(0 with respect to the point to 

t—>to — t-+to + 

in an obvious manner. The vector function u(f) will be continuous at t = to 
if u(/o— ) = u(?o+). Use this concept to determine which of the following 
vector functions are continuous at the stated points : 

, .. , . sinh 2t . ,. , , 

(a) u(0 = i + /e'j + cosh tk at t = 0; 

(fn _ a n\ 
1 i + cosh t\ + tanh tk at t = a. 



PROBLEMS / 523 



11-4 Form the vector functions u(f) + v(/), u(0 x v(t), and the scalar function 
u(0 . v(r) given that: 



„(,) = ,2 i + sinhfj+ jj_lfj 

and 

\(t) = 2ti + cosh t\ + sin tk. 



11-5 Determine du/d? and d 2 u/d/ 2 for the vectors u defined in (a) and (c) of 
Problem 111, and find du/d/ for the vector 

r , d 2 r 

u = - + (a . r)b + a x — , 

where r = t(0, r = | r [ and a, b are constant vectors. 

11-6 The position vector of a particle at time t is 

r = cos (/ - l)i + sinh (t - l)j + <xt s k. 

Find the condition imposed on a by requiring that at time t = 1 the accelera- 
tion vector is normal to the position vector. 

11-7 Find the unit tangent T to the curve 
r = tl + t 2 } + t 3 k 
at the points corresponding to t — and t = 1. 
11-8 Prove results (a) to (c) of Theorem 11-3. 

11-9 Prove result (d) of Theorem 11-3: (a) by expansion of the vector product 
u x v followed by subsequent recombination of the results; and (b) using 
determinants. 

11-10 Find B, T, and N when t = Jw, given that 

r = (cos t + sin 2 t)i + sin t(\ — cos t)\ — cos tk. 
11-11 Find B, T, N, k, and t for the helix 

r = (1 — cos t)\ + sin t] + tk, 
when t = 1 77. 

11-12 Prove that if r(t) is a suitably differentiable function of the real variable /, 
and s is the arc length along the parametrically defined curve r = r(f), then 
with the usual notation 

dr _ dy 
df dt ' 



dr 2 
Hence show that 



dr djr 
At X dt 2 



dr 2T + K [dt) N - 

t 

•a) «• 



and deduce that 



524 / SCALARS, VECTORS AND FIELDS 



CH 11 



dr d 2 r 




dr 




d 2 r 


dt * dt 2 


1 


dt'~ dr 2 


dr d 2 r 


o 




dr 


3 


dr * dt 2 






dt 





11-13 Apply the results of Problem 11-12 to deduce B, T, and N for the curve 
r = ti + t 2 \ + 1 3 k 
when f = 1. 

11-14 This problem gives an elementary derivation of the radius and circle of 
curvature for a plane curve. If at a point (f, jj) on a curve j = /(x), a circle 
of radius p and centre (a, p) is tangent to the curve and has the same second 
derivative, then it is called the circle of curvature at (£, rj). The number p is 
called the radius of curvature and (a, /?) the centre of curvature. 

Let the circle of curvature at (f, j?) have the equation (X — a) 2 + 
(7 - ft 2 = p 1 , where (Z, Y) is a general point on the circle (see figure). 
By differentiation of this equation with respect to X, and using the tangency 
condition dYldX = /'(!) at (I, »?)> show that 

(f _ «) + fo - fl/XD = 0. 
By a further differentiation of the equation with respect to X, and by using 
the equality of second derivatives d 2 Y\dX 2 =/"(£) at (I, ??), show that 

1 + (/'(f)) 2 + fo - «/"(?) = o. 
Use the fact that (f, jj) lies o/j the circle of curvature to deduce that: 

a= l--^|(l+(/'(f)) 2 ) 




PROBLEMS / 525 



and that 

__ CI + (AfflT" 

' LTWl 

Find the centre and radius of curvature of the curve y = 1 + x 2 at the 
point (1, 1). 

11-15 Use the results of Problem 11-12 to show that the circular helix 
r = a cos ti + a sin t\ + btk 
has the constant radius of curvature p = (a 2 + b 2 )\a. 

11-16 Show from the Serret-Frenet equations and the fact that T = drjds, that 
the torsion t may be expressed in the form : 

dr 



As 



"djr dV 
ds 2 ds 3 



©" 



11-17 If r = r(f), where the parameter t is not the arc length along the curve, 
prove that 

dr _ dr ds 

"di ~~ ds dt' 

d^r _ djr /ds\ 2 dr dJ£ 
dr 2 ~ d5 2 \dt) + dsdt 2 ' 



ds\ 3 d 2 r ds d 2 s dr d 3 j 
dl 2 di di 2 + ds df 3 " 



_ djr /ds\ 2 dr 
~~ d5 2 [dtj + ds 

dh _ dfr /ds\ 
dr 3 - di 3 \df j 

nee show that 

dr /&r djiA = dr /dh dh\ /ds\ 

dt ' [dt 2 * dr 3 ) ~~ ds ' [ds 2 X ds 3 J [dtj 



dr 
d/ 



"djr dV 
ds 2 di 3 



, that 

dr 
dr ' 



'djr dV 
df 2 X dr 3 



Use the result of Problem 11-12 to deduce that 



and k = — = 

p 





dr 




[~d 2 r d 


3 rl 




dr 


dr 2 X dr 3 






"dr d 2 r~ 
dr X dt 2 


2 









dr 




d 2 r 


dr * dr 2 




dr 


3 




dr 





11-18 Apply the result of Problem 11-17 to find the torsion r of the non-constant 
pitch helix 



r = cos ti 



. . /e ( — e~ e \ 

I + S1I1/J + / jk. 



526 / SCALARS, VECTORS AND FIELDS CH 11 

Section 11-2 

11-19 Find the antiderivative of the following two functions t(t): 

(a) f(0 = cosh 2ti + - j + t 3 k; 

(b) f(0 = t 2 sin ti + e ( j + log tk. 

11-20 Verify the following antiderivatives using Definition 11-3: 

(a) |7r . ^\ dt = Kr . r) + C = it 2 + C; 



/L , f/dr d 2 r\ _, 1 dr dr ^ 1 /dr\ 2 „ 

(b) J(d-r-d7i) d, = 2d7-d7 + c = 2(d7) +C; 

^ f d 2 r dr ^ 

(c) Jrx- = rx- + C, 



I 



where C, C are arbitrary constants. 

11-21 Use the result of Problem 11-20 to express dr/dt in terms of r, given that r 
satisfies the vector differential equation 

d 2 r 
_ + n 2 r = 0. 

11-22 Evaluate the definite integral 
> 

(t 2 e ( i + t log t\ + t 2 k)dt. 
J\ 

11-23 The displacement of a particle P is given in terms of the time / by 

r = cos 2ti + sin 2t] + t 2 k. 

If v and / are the magnitudes of the velocity and acceleration respectively, 
show that 

11-24 A point moving in space has acceleration cos ti + sin t\. Find the equation 
of its path if it passes through the point (— 1, 0, 0) with velocity — j + k at 
time / = 0. 

11-25 Evaluate the line integral of F = xyi + yz\ + zk along the contour defined 
by r = ti + t % ) + t s k from t = to t = 1. 

11-26 Evaluate the line integral of F = 4xyi — 2x 2 \ + 3zk from the origin to the 
point (2, 1, 0) along the contour: 

(a) from the origin to the point (2, 0, 0) and then from the point (2, 0, 0) to 
the point (2, 1,0); 

(b) from the origin to the point (0, 1 , 0) and then from the point (0, 1 , 0) to 
the point (2, 1,0); 

(c) from the origin to the point (2, 1, 0) along the straight line joining these 
two points. 

(Hint : the contours (a), (b), (c) all lie in the plane z = 0.) 

11-27 Evaluate the line integral F = 4xyi + 2x 2 ) + 3zk from the origin to the 
point (2, 1, 0) along the contours of Problem 11-21. 



PROBLEMS / 527 



Section 11-3 

11-28 A particle moves in a curve given by 

r = a(l — cos 0) with — = 3. 
At 

Find the components of velocity and acceleration. Show that the velocity is 

zero when 8 = 0. Find the acceleration when 8 = 0. 

11-29 A particle moves on that portion of the curve r = ae e cos 6 (a = constant) 
for which < 6 < £ir, so that its radial velocity u remains constant. Find 
its transverse velocity and its radial and transverse components of acceleration 
as functions of u and d. 

11-30 If the fluid velocity v = yi + 2x\, determine the circulation y by integrating 
anti-clockwise around the rectangular contour x = ±a, y = ±b. Show that 
the sign of y is reversed if the direction of integration is taken clockwise 
around the same contour. 

11-31 Consider the three rectangular regions (a)0<*<l, — l<y<l, 
(b) < x < 1, 1 < y < 2, and (c) < x < 1, - 1 < y <c 2 and denote 
their boundary curves by I 1 !, r 2) and r 3 . If F = 2yi + x\, evaluate the three 
line integrals 



Ji = F . dr, Ji = F . dr, J 3 = f F . dr, 

Jri Jr 2 Jr 3 



and hence show that J\ + Ji = h. 

11-32 Given that F = cosj-i + sin*j, evaluate the line integral of F taken anti- 
clockwise around the triangle with vertices at the points (0, 0), (£w, 0), 
(£*•> £")• 

11-33 A vector field F is said to be irrotational if its line integral around any closed 
curve T is zero. By integrating around two conveniently chosen contours, 
deduce which of the following vector fields are irrotational : 

(a) F = y sinh z\ + x sinh z\ + xy cosh zk; 

(b) F = xi + y\ + zk; 

(c) F = xyzH + x 2 z a \ + x 2 yzk. 

Section 11-4 

11-34 Find the gradient of the following functions <l>: 

(a) $ = cosh xyz; 

(b) ^ = x 2 +y 2 + z 2 ; 

(c) $ = xy tanh (x — z). 

11-35 Find the directional derivative of the following functions <£ in the direction 
of the vector (i + 2j — 2k) : 

(a) </> = 3x 2 + xy 2 + yz; 

(b) ^ = x 2 yz + cosy; 

(c) <l> = 1 1 xyz. 

11-36 If new independent variables S, n, i are introduced through the equations 
£ = x + a., n = y + p, and t = z + y, where a, /3, and y are constants, 
and 4> is a suitably differentiable function, prove that 






528 / SCALARS, VECTORS AND FIELDS CH 11 



Deduce from this result the fact that grad 4> is unchanged by a translation 
of the origin of the coordinates. This property is described by saying that 
grad <l> is invariant with respect to a translation of the coordinate system. 

11-37 If new independent variables f, rj, { are introduced through the equations 

$ = aux + ai2y + avsz, 
V = a2ix + az?y + a23Z, 

£ = 031X + 032)> + A33Z, 

and <j> is a suitably differentiable function, prove that 



/. 8 . 8 , 8 \ , I. <> .8 . 3 \ 



Deduce from this result the fact that grad ^ is unchanged by a rotation of 
the coordinate system. This property is described by saying that grad <t> is 
invariant with respect to a rotation of the coordinate system. 

11-38 If a is a constant vector and r = xi + y] + zk, r = | r | prove that 

(a) grad (a. r) = a; 

(b) gradr = r; 

(c)grad^ =-~ 

11-39 By using the Cartesian representation of grad <£ as expressed in Definition 
11-5, prove that 

(a) grad (a<j> + by>) = a grad <t> + b grad y>; 

(b) grad (^y>) = <f> grad y> + V> grad <f>, 

where a, b are scalar constants and <j>, y> are suitably differentiable functions. 

11-40 A vector field F will be irrotational if it is expressible in the form F = grad <f>, 
with </> a scalar potential. Find the most general scalar potential ^ that will 
give rise to the irrotational vector field 

F = O + $y 2 z 2 )i + xyz 2 ) + xy 2 zk. 

11-41 Find the unit normal n to the surface x 2 + 2y 2 — z 2 — 8 = at the point 
(1, 2, 1). Deduce the equation of the tangent plane to the surface at this 
point. 

11-42 Find the unit normal n to the surface x 2 — 4y 2 + 1z 2 = 6 at the point 
(2, 2, 3). Deduce the equation of the plane which has n as its normal and 
which passes through the origin. 

11-43 If (xo, jo, zo) is a point on the conic surface 

x 2 + y 2 + z 2 = 
a b c 

show that the tangent plane to the surface at that point is 
xxo yy zzo _ 1 
a b c 

11-44 The vector field F is generated by the scalar potential <j> = x 2 y. Verify 
directly by integration that the line integral of F along each of the three paths 



PROBLEMS / 529 

of Problem 11-26 is equal to 4. Confirm this result by using the fact that if 
F = grad 4, then 

F . dr = -KB) - 4(A). 



I 



11-45 The Newtonian law of gravitation asserts that the force of attraction between 
point masses m u rm distant r apart acts along the line joining them and has 
magnitude (Gmi m2)lr 2 , where G is the gravitational constant. Show that 
this force law corresponds to a potential 4 = (Gm\ m^lr. 

11-46 If v = vii + v 2 ) + Dak is a vector field, then the scalar operator v . grad 
expressed in Cartesian coordinates is defined to be 

v . grad = v . V = vi — + v 2 — + v 3 -■ 
ox 8y 8z 

Hence if F, 4> are suitably differentiate vector and scalar fields, respectively, 
it follows that v . grad 4> is a scalar and v . grad F is a vector. Given that 



= x 3 yz 2 , 



find 

(a) v . grad 4>\ 

(b) v . grad F; 

(c) v . grad v. 



y = xyi+yj + xzk, F = xH + y*] - z 2 k, 



11-47 Special differential operators called the divergence and the curl of a vector 
can be defined in terms of Cartesian coordinates by means of the operator V. 
If F = Fii + F 2 j + F 3 k is a suitably differentiable vector field, then the 
divergence of F is denoted either by div F or V . F and is the scalar defined 
by 

divF^V.F = ^ + ^ + ^ 3 - 

8x 8y ^ 8z 

The curl of F is denoted either by curl ForVxF and is the vector defined by 

k 

8 



curl F = V x F = 



Show that 



8 

8x 
Fi 



J 

8 
Ty 
F 2 



8z 



\8y 8z J T \8z 8x) l ^\8x Sy ) *' 



If ^ is a suitably differentiable scalar function show by direct substitution 
into the definitions that 

(a) div (4>F) = F . grad 4 + 4> div F; 

(b) curl (4>F) = F x grad 4> + 4- curl F. 

11-48 Find V . F and V x F given that 
F = x 2 y 2 i + y 2 z 2 ) + xzk. 



530 / SCALARS,VECTORS.AND FIELDS CH 11 

11-49 Prove from the definitions that 

(a) curl grad 4> = 

(b) div curl F = 0, 

where <j>, F are suitably differentiable scalar and vector functions respectively. 

11-50 Give an example of a differentiable scalar potential $ and vector field F. 
Use them to confirm the results of Problem 11-49 by means of direct differ- 
entiation. 

11-51 In the slow one-dimensional flow of a viscous fluid between parallel plates 
the velocity field has the form 



-(-*) 



k, 



where the plates coincide with the planes x = ±d and the z-axis points in 
the direction of flow. By selecting a convenient contour in the (x, z)-plane, 
prove that the circulation is non-zero so that the flow cannot be irrotational. 

Section 11-5 

11-52 The velocity field describing a fluid flow is 

v = 2i + yt) + k. 

Write down the differential equations describing the streamlines and the 
particle trajectories and solve them as in Example 11-15 in the text. 



Series, Taylor's theorem 
and its uses 



12-1 Series 

The term series denotes the sum of the members of a sequence of numbers 
{a n }, in which a n represents the general term. The number of terms added may 
be finite or infinite, according as the sequence used is finite or infinite in the 
sense of Chapter 3. The sum to N terms of the infinite sequence {a n } is written 

x 
a\ + a 2 + • ■ ■ + ajv = 2 a„, 

«=i 

and it is called a finite series because the number of terms involved in the 
summation is finite. The so called infinite series derived from the infinite 
sequence {a n } by the addition of all its terms is written 

00 

«i + 02 + • • • + a r + ■ • • = 2 a n . 

n = l 

The following are specific examples of numerical series of essentially 
different types: 



N 

I 

n = \ 



(a) 2 »a = 12 + 22 + • • • + n\ 



in which the general term a n = n 2 ; 

00 1 11 i 

(b)2 --i + i+l + i. + . ..+! + 



r\ 



in which the general term a n = 1/nl; 
00 1 11 1 

(c)2 i = i+I + - + . .. + ! + ... 

n=i n I 5 r 

in which the general term a n = \jn; 

» 2n 2 + 1 1 9 19 , 2r2 + 1 

in which the general term a n = (2n 2 + l)/(4n + 2); 

(e) | (-!)»+! =1-1 + 1-1 + . .. + (_l)r+l + 



K = l 



532 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

in which the general term a„ = (— l) n+1 . 

Only (a) is a finite series ; the remainder are infinite. 

There is obviously no difficulty in assigning a sum to a finite series, but 
how are we to do this in the case of an infinite series ? A practical approach 
would be to attempt to approximate the infinite series by means of a finite 
series comprising only its first N terms. To justify this it would be necessary 
to show in some way that the sum of the remainder Rn of the series after N 
terms tends to zero as N increases and, even better if possible, to obtain an 
upper bound for Rn. This was, of course, the approach adopted in Chapter 6 
when discussing the exponential series which comprises example (b). In the 
event of an upper bound for Rn being available, this could be used to deduce 
the number of terms that need be taken in order to determine the sum to 
within a specified accuracy. 

The spirit of this practical approach to the summation of series is exactly 
what is adopted in a rigorous discussion of series. The first question to be 
determined is whether or not a given series has a unique sum; the estimation 
of the remainder term follows afterwards, and usually proves to be more 
difficult. 

To assist us in our formal discussion of series we use the already familiar 

00 

notion of the nth partial sum S n of the series 2 a n, which is defined to be the 

n = l 

finite sum 

n 

Sn =2 °r = fl l + «2 + ' - " + «»• 
r=l 

Then, in terms of S n , we have the following definition of convergence, which 
is in complete agreement with the approach we have just outlined. 



definition 12T (convergence of series) The series J #« will be said to be 
convergent to the finite sum S if its «th partial sum S n is such that 

lim S n = S. 

n~*oo 

If the limit of S n is not defined, or is infinite, the series will be said to be 
divergent. 

The remainder after n terms, R n , is given by 

Rn = a n +i + a n +2 + ■ • • + a n ^ r + * ' ', 

so that if {S n } converges to the limit S, then R n = S — S„ and Definition 
12-1 is obviously equivalent to requiring that 



SEC 12,1 SERIES / 533 

lim (5" - S„) = lim R„ = 0. 

n— *oo n—*co 

Example 12-1 Find the «th partial sum of the series 

, 111 1 

1 H 1 1 1- • • • H h • • ■ 

3927 3» 

and hence show that it converges to the sum 3/2. Find the remainder after 
n terms and deduce how many terms need be summed in order to yield a 
result in which the error does not exceed 001. 

Solution This series is a geometric progression with initial term unity and 
common ratio 1/3. Its sum to n terms, which is the desired nth partial sum 
S n , may be determined by a well known formula (see Problem 12-2) which 
gives 

1 - (l/3)» 3 

Sn - -p^f = - 2 [1 - (1/3).]. 



We have 

3 
hm S n = lim - 

n-»oo n— *-oo -^ 



-ffl 



= 3/2, 



showing that the series is convergent to the sum 3/2. 

As S„ is the sum to n terms, the remainder after n terms, R„, must be 
given by R„ = 3/2 - S n , and so 



*-i(T- 



If the remainder must not exceed 001, R n < 001, from which it is easily 
seen that the number n of terms needed is n > 5. The determination of R n 
was simple in this instance because we were fortunate enough to have avail- 
able an explicit formula for S„. In general such a formula is seldom available. 

The definition of convergence has immediate consequences as regards 
the addition and subtraction of series. Suppose Sa B and S6„ are convergent 
series with sums a, /S. (It is customary to omit summation limits when they 
are not important.) Let their respective partial sums be S n = a\ + a 2 + ■ • ■ 
+ an, S n ' = bi + bi + • • • + b n and consider the series S(a B + b n ) which 
has the partial sum S n " = S n + S n '. Then 

lim Sn" = lim (S n + S„') 

n— *-co n— >oo 

= lim 5„ + lim S n ' = a + /?, 



534 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

showing that 

00 

1 (a» + b„) = x + p. 

n = l 

A corresponding result for the difference of two series may be proved in 
similar fashion. We have established the following general result. 

theorem 121 (sum and difference of convergent series) If the series 

00 00 

2 a n and J b n are convergent to the respective sums « and /S, then 

n=\ n=l 

00 00 

2 ( a » + b n ) = « + ft; 2 ( a » — An) = a - /S. 

n=l n=l 

Example 12-2 Suppose that a n = (l/2)» and 6„ = (1/3)", so that the 

00 

series involved are again geometric progressions with 2 0/2) n = 2 and 

n = l 

CO 

2 (l/3)» = 3/2. Then it follows from Theorem 12-1 that 

n = \ 

CO 00 

2 [(1/2)- + (1/3)"] = 7/2 and 2 [(1/2)" - (1/3)"] = 1/2. 

Let us now derive a number of standard tests by which the convergence 
or divergence of a series may be established. We begin with a test for 
divergence. 

Suppose first that a series Sa„ with «th partial sum S n converges to the 
sum S. Then from our discussion of the convergence of a sequence given in 
Chapter 3, we know that for any e > there must exist some integer N such 
that 

| S n - S | < e for n > N. 
This immediately implies the additional result 

1 S n +i - S | < e. 
Hence, 

e + e > | S»+i - S | + | S n - S \ = \ S n +i - S \ + \ S - S„ | 

> | Sn+l — S n |. 
However, as S n +i — S n = a n +i, we have proved that 

I tfn+1 | < 2e for n > N. 

As e was arbitrary, this shows that for a series to be convergent, it is necessary 
that 

lim | a n | = 

n~>oo 



SEC 12-1 SERIES / 535 

or, equivalently, 
lim a„ = 0. 

n— *co 

GO 

If this is not the case then the series J «« must diverge. This condition thus 

n = \ 

provides us with a positive test for divergence. 

00 

theorem 12-2(a) (test for divergence) The series £ a„ diverges if 
lim a n ^ 0. 



This theorem shows, for example, that the series (d) is divergent, because 
a n = {In 2 + l)/(4« + 2), and hence it increases without bound as n increases. 

It is important to take note of the fact that this theorem gives no information 
in the event that lim a n = 0. Although we have shown that this is a necessary 

JI-+00 

condition for convergence, it is not a sufficient condition because divergent 
series exist for which the condition is true. 

Theorem 122(a) gives no information about either series (a) or (c) as in 
each case lim a n = 0. In fact, by using another argument, we have already 

n-*oo 

proved that the series representation for e in (b) is convergent, whereas we 
shall prove shortly that the harmonic series (c) is divergent. Series (e) must 
also be divergent according to our definition, because a n oscillates finitely 
between 1 and — 1, and also S n does not tend to any limit. 

The terms of series are not always of the same sign, and so it is useful to 
associate with the series Za„ the companion series S | a„ |. If this latter 
series is convergent, then the series 2a„ is said to be absolutely convergent. It 
can happen that although 2a„ is convergent, S [ a„ | is divergent. When this 
occurs the series £a» is said to be conditionally convergent. Now when terms 
of differing signs are involved, the sum of the absolute values of the terms of a 
series clearly exceeds the sum of the terms of the series, and so it seems reason- 
able to expect that absolute convergence implies convergence. Let us prove 
this fact. 

theorem 12-2(b) (absolute convergence implies convergence) If the series 

00 _ to 

2 | an | is convergent, then so also is the series T a n . 
" =1 n=l 

Proof The proof of this result is simple. Let S n = | a x | + | a 2 | + • • • 
+ I a n | and S n ' = a\ + « 2 + • • • + a n be the «th partial sums, respectively, 
of the series in Theorem 12-2. Then, as a r + | a r \ is either zero or 2 | a r \, it 
follows that 

< S n + S n ' < 2S n '. 



536 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

Now by supposition Iim S n ' = S' exists, so that taking limits we arrive at 

n— «-oo 

< lim (S n + S n ') < 25'. 

n— *-oo 

This implies that the series with «th term a n + \a n \ must be convergent and 
hence, using Theorem 12-1, that ]T a n must be convergent. 

Example 12-3 Consider the series 

»r «! 2! 3! ^ 

As a» = (—l) n jnl, we have | a„ | = l/«!, which is the general term of the 
exponential series denning e. Thus Theorem 12-2, and the convergence of 

the exponential series, together imply the convergence of 2 ( — I)*/"! 1° 

M = 

fact this is the series representation of 1/e. 

Suppose 26» is a convergent series of positive terms, and that Sa» is a 
series with the property that if N is some positive integer, then | a n | < b n 
for n > N. Then clearly the convergence of S6« implies the convergence of 
2 I fl » I an d, by Theorem 12-2, also the convergence of 2a w . By a similar 
argument, if for n > N, < b n < a„, and S6 n is known to be divergent, 
then clearly 2a ra must also be divergent. We incorporate these results into a 
useful comparison test. 

theorem 12-3 (comparison test) 

(a) Convergence test Let S6« be a convergent series of positive terms, and 
let 2a„ be a series with the property that there exists a positive integer N 
such that 

| an I < b n for n > N. 
Then Sa„ is an absolutely convergent series. 

(b) Divergence test Let Sft„ be a divergent series of positive terms, and let 
Sa n be a series of positive terms with the property that there exists a positive 
integer N such that 

< b„<a n for n > N. 

Then 2a„ is a divergent series. 



Example 12-4 

iider t 

2 + (-l)» 3 
2» 2 n ' 



(a) Consider the series 2 [2 + (— l) M ]/2». We have 



fln 



SEC 12-1 



SERIES / 537 



and as £ 3/2» = 3 J 1/2" = 9/2, the conditions of Theorem 12-3 (a) are 

w = 1 n = 1 

satisfied if we set 6„ = 3/2". It thus follows that the series Sa M is convergent. 

CO 

(b) Consider the series ]T (n + l)/« 2 . Here we have 



n+\ 1 /«+ 1\ 1 

a„ = — — = - > -, 

n 2 n\ n J n 



and as the harmonic series Sl//j is divergent, the conditions of Theorem 
12-3 (b) are satisfied when we set b„ = \jn. Hence Sa„ is divergent. 




n - 1 « x' 



O I 2 3 4 n -1 n x* d ~~~~™~ 

(a) (b) 

Fig. 121 Comparison between series and integral. 

A powerful test for the convergence or divergence of a series Sa re of 
positive terms follows by a comparison of the shaded rectangles in Fig. 12.1. 

Let f(x) be a non-increasing function denned for 1 < .v < oo which 
decreases to zero as x tends to infinity, and let/(«) = a n , where n is an integer. 
Then we have the obvious inequality 



n r*n n 

2/(r)< f(x)dx<Zf(r) 
or, equivalently, 

n rn n 

1 a r < f(x) dx<2 a r . 

r = 2 Jl 



r = \ 



As the right-hand side of this inequality only exceeds the left-hand side by the 
single term a\, it must follow that in the limit, the infinite series Sa r and the 
integral 



lim f{x 

K— ►OO Jl 



:) dx 



538 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



converge or diverge together. This conclusion may be incorporated into a 
test as follows. 

theorem 12-4 (integral test) Let/(x) be a positive non-increasing function 
defined on 1 < x < oo with lim/(.v) = 0. Then, if a n =/(«), the series 

X— »-co 

OO 

2 (tn converges or diverges according as 

n = l 



1 



\x) dx 

is finite or infinite. 

Corollary 12-4 (R N deduced from integral test). Letf(x) be a positive non- 
ce 
increasing function defined on 1 < x < oo with \imf(x) = 0, and let J «« 

n~*ou n~ 1 

be convergent, where a n =f(n). Then the remainder R N after N terms 
satisfies the inequality 



JN 



Rn< f(x)dx. 

JN 

Proof The result follows at once from the obvious inequality 

y' ry' y' 



y t ry y 

X a r< \ f(x)dx<^a r 

= * + l JN r = N 



by taking the limit as N' —*■ oo. This is possible because, by hypothesis, 
Sa ra is convergent so that the improper integral involved exists. 



Example 12-5 

OO 

(a) Consider the series 2 !/«*> where k > 0. Then the function /(x) = 1/jc* 

n = l 

satisfies the conditions of Theorem 12-4. Hence this series converges or 
diverges according as 



lim r* 

n-»co Jl X k 



dx 

k 



is finite or infinite. If k =^= 1 we have 



lim — = I- -I lim 

n— *-°o Jl -^ \A ^/ n-+co 



1 
nk—1 



Hence for < k < 1 this limit is infinite, showing that the series is divergent 
for k in this range, whereas for k > 1 this limit has the finite value l/(k — 1), 
showing that the series is convergent for k > 1. Applying Corollary 12-4 



SEC 12-1 SERIES / 539 

shows that when k > 1, the remainder ^ after N terms must satisfy the 
inequality 

Rx<N^- k ^{k - 1). 

When fe = 1 we obtain the harmonic series, which must be treated 
separately. As it follows that 

lim — = lim log n — >- oo, 

ft-* CO Jl X ft-* CO 

we have proved that the harmonic series is divergent. 

(b) Consider the series £«/(l + n 2 ). Here we set/(x) = x/(l + x 2 ), so 

M = l 

we must examine 



rn 

= lim 

ft-* oo Jl 



xdx 



1 + x 2 
Setting x 2 = h we find 

Z, = lim Mlog (1 + x 2 ) - log 2] -> oo. 

ft— *00 

Hence the series is divergent. 

Two other useful tests known as the ratio test and the «th root test may 
be derived from Theorem 12-3, essentially using a geometric progression for 
purposes of comparison. The idea involved in these tests is that a series is 
tested against itself, and that its convergence or divergence is then deduced 
from the rate at which successive terms decrease or increase. 

Suppose that Ea„ is a series for which the ratio a n+1 la n is always defined 
and that lim | a n +ila„ | = L, where L < 1 . Let r be some fixed number 

such that L < r < 1. Then the existence of the limit L implies that there 
exists an integer A^ such that 

I Oft+i | < r ] a n | for n >.N. 

Hence it follows that 

I a N +2 | < r | a N+ i |, | a N+3 \ <r\ a N+2 \ < r 2 \ a N+1 ],..., 
and in general 

[ «A-+m+i | < r m I a N+ i \. 
Thus if R N is the remainder after N terms we have 

GO CO 

Rn = J, a n < ^ I a n | < | oa-+i I (1 + r + r 2 + • • •). (*) 

«=jV+1 «. = J\ t +1 v ' 



540 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 



The expression in brackets is a convergent geometric progression because, 
by hypothesis, r < 1 . As the remainder term Rn is finite, and is less than the 
sum of the absolute values of the terms comprising the tail of the series, it is 
easily seen that the series 2a« must be absolutely convergent. If L > 1 the 
terms grow in size, and the series Sa« is divergent. Nothing may be deduced 
if L = 1 for then the series may either be convergent or divergent as illus- 
trated by Example 12-5 (a). In that case a n +ila„ = n k J(n + l) k , giving 
lim | a n +ila n | = 1 ; and the series was seen to be divergent for < k < 1 
and convergent for k > 1. 

Expressed formally, as follows, these results are called the ratio test. 

CO 

theorem 12-5 (ratio test) If the series ^ o n is such that a n ^ and 

Cln+l 



lim 

«-*00 



a n 



= L, 



then 



(a) the series Sa w converges absolutely if L < 1, 

(b) the series 2a„ diverges if L > 1 , 

(c) the test fails if L = 1. 



Example 12-6 

(a) Consider the series 

Then a n ¥= and 

2=2 = r_i )2 » + i <" + W"" 
a„ K ' (n + l)»+i, 



!» / 1\ 

_ = ( _ 1)2 „ +1 ( 1+ -J 



l\-» 



Hence 
lim 

n-*co 



tf»+l 



-, , r„ 1 /( 1+ ;)"'"' 5 - 



where the final result follows by virtue of the work of Section 3-3. As e > 1, 
the ratio test proves the absolute convergence of this series. 



(b) Consider the series 2 l/ w - Here a n = l/«! =^=0 and 

n = l 



ni 



1 



dn+l 



a n («+!)! n + 1 



Ctn+l 



On 



SEC 12-1 



SERIES / 541 



Hence 
lim 



On+1 



= lim — 

n—a> n + 1 



= 0, 



and as < 1 the ratio test proves the series to be convergent. 

00 

(c) Consider the series 2 3"/w. 

n = l 



Then a n ^ and 



flit+i = / 3"+* \/n\ _ / n \ _ 
a n \ n+ [j\3n)- \„+\)~ 



tf»+l 



a n 



Now 
lim 

n— *oo 



0«+l 



On 



= lim 



3n 



„-*«, n + 1 



= 3, 



and as 3 > 1 the ratio test proves the series to be divergent. 



(d) Consider the series J l/(2« + l) 2 . 
«=i 



Then n,^0 and 



gn+l 

Now 
lim 

n— *oo 



= / 2 "+ 1 N i 2 

\2n + 3/ ~ 



tf»+i 



«n 



tf»+i 



On 



n-.oo \2n + 3/ 



so that the ratio test fails in this case. In fact the series is convergent, as may 
readily be proved by use either of the comparison test, with b n = Ijn 2 , or 
the integral test. 

As the remainder term R N used in the proof of the ratio test may be either 
positive or negative, the estimate (*) is equivalent to 

I Rn I < | a N+1 \(l+ r + r*+- • •) 

or, summing the geometric progression, to 



Rn\<- 



1 -r 



This simple result provides an estimate of the error if the summation is 
terminated after N terms and comprises our next result. 

Corollary 12-5 (R N deduced from ratio test) Let the series J* a„ be con- 
vergent, and let the ratio test be applicable with " =1 



542 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



lim 

n->oo 



tf«+l 



a n 



= L. 



Then, if r is a number such that L < r < 1, the remainder R N after N terms 
is such that 

\R»\< laN+l1 



1 -r 

Let us use Example 12-6 (a) to illustrate this and to compute \ Rs\. We 
have L = 1/e and, as e = 2-7182 . . ., we could take r = 0-5. Then 1/(1 - r) 
= 2, whence 

, 2NI 

Hence \Rs\< 48/625. 

In the so called nth root test, appeal is also made to the geometric pro- 
gression to prove convergence. Suppose the series £#« is such that 

lim VI an\=L, 

and that L <\. Then if r is some definite number such that L < r < 1, the 
existence of the limit implies that there exists an integer N such that 

"VI a„\ < r for n> N. 

Hence | a n \ < r n for n > JV. Thus, as with the ratio test, the remainder 
after N terms may be overestimated by the sum of the absolute values of the 
remaining terms, and the result still further overestimated in terms of 
| fljv-n | and a geometric progression with common ratio r. As r < 1 this re- 
mainder is finite, thereby establishing that 2a„ is absolutely convergent. If 
L > 1, then successive terms grow and the series is divergent. As with the 
ratio test, the nth root test fails when L = 1, for then Sa„ may be either 
convergent or divergent. Stated formally we have : 

00 

theorem 12-6 («th root test) If the series J a n is such that 
lim "VI a n \=L, 

then 

(a) the series 2a„ is absolutely convergent if L < 1, 

(b) the series 2a„ is divergent if L > 1 , 

(c) the test fails if L = 1. 

Example 12-7 

(a) Consider the series 



SEC 12 ' 1 SERIES / 543 

nk \ n 



V / nk V 
n =i \3n + 1/ 



where k is a constant. Then 



/ nk \ n 
°n — I I and lim » VI «» I = h'm 



nk k 



3«+ 1 3 



Thus the nth root test shows that the series will be convergent if k < 3 and 
divergent if k > 3. It fails if k = 3, though Theorem 12-2 then shows the 
series to be divergent. 

00 

(b) Consider the series £ nj2 n . 
»=i 

Then a„ = n\l n = | a n |, and "VI «» 1 = i B V«- Taking logarithms we find 

log [VK |] = log | + - log w. 
n 

Now by Theorem 6-4 (b) we know that lim (log n)\n = 0, so that 

«->co 

lim log [VI a» |] = log 1, 

whence 

lim V" = s- 

n-*co 

As | < 1 the test thus proves convergence. In this instance it would have been 
simpler to use the ratio test to prove convergence. 

If Sa„ is convergent by the «th root test, then we have seen that a number 
N exists such that | a n | < r n for n > N, where < r < 1. Hence we have 

«> co oo jv+1 

Rn= 2 a n <\R N \< ^ |a»l< X rn = - ' 

n = N + l n = N + l n = N + l I — T 

and so 



Rn\< 



r 



■JV+l 



1 -r 



We express this overestimate of the remainder term as a corollary to the nth 
root test. 

CO 

Corollary 12-6 (R N deduced from nth root test) Let X a n be convergent by 
the nth root test with n=1 

lim VI a n | = L. 



544 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

Then if r is a number such that L < r < 1, the remainder Rn after N terms 
is such that 



r 
Rn\< 



■jv+i 



1 - r 

We illustrate this result by obtaining an estimate for the remainder after 
three terms of the convergent series of Example 12-7 (b). In that case L = \ 
so that we must choose r such that \ < r < 1 . If we select r = f , then 



Rz\< 



8/5\ 4 _ 625_ 
3\8/ ~ 1536' 



Had r been chosen closer to the value \, then a sharper estimate would have 
been obtained. Thus by taking r = 9/16 it follows that 

, „ , 6561 

*3 < 



28672 



For our final result we prove that all series in which the signs of terms 
alternate, whilst the absolute values of successive terms decrease monotonic- 
ally to zero, are convergent. Such series are called alternating series and are 
of the general form 

00 

2 (-l) n+1 a n = ai - 02 + as - 04 + • • •, 

n = l 

where a n > for all n. 

To prove our assertion of convergence we assume «i > a% > az > • • ; 
and lim a n = and first consider the partial sum SW corresponding to an 
even number of terms 2r. We write Sir in the form 

S 2r = (ai — a 2 ) + {az — at) + • • • + (a 2r -i — air). 

Then, because fli > ai > az > • ■ ; it follows that SW > 0. By a slight 
rearrangement of the brackets we also have 

Sir = a\ — (02 — 03) — («4 — O5) — • ■ • — («2r-2 — «2r-l) — «2r, 

showing that as all the brackets and quantities are positive, S 2r < fli- Hence, 
as iS , 2r is a bounded monotonic decreasing sequence, we know from Chapter 
3 that it must tend to a limit S, where 

< S < ai. 

Next consider the partial sum SW+i corresponding to an odd number of 
terms 2r + 1. We may write S% r +i = S 2 r + a 2r +i- Then, taking the limit of 
S 2 r+i we have 

lim Szr+i = lim S 2r + lim a 2r +i = S, 



SEC 12-1 SERIES / 545 

because by supposition lim a 2r +i = 0. Thus both the partial sums 5 2r and 
the partial sums S 2r +i tend to the same limit S. Hence we have proved that 
for n both even and odd 

lim S n = S, 

n—-co 

thereby showing that the series converges. 

CO 

theorem 12-7 (alternating series test) The series 2 (— l) n+1 a n converges 

n = l 

if a„ > and a n +i < a n for all n and, in addition, 
lim a n = 0. 



Example 12-8 

(a) Consider the alternating series 

i<-=!)?_ 1 _I + _L__L + ... 

n ~i 2» 2 2* 23 

in which the absolute value of the general term a n = $». Then, as it is true that 
«n+i < a n and lim a n = 0, .the test shows that the series is convergent, 
(b) Consider the alternating series 

00 

2 (-1)" +1 2 ""V2 = a/2 - V2 + V2 - V2 + • • -, 
«=i 

in which the absolute value of the general term a n = n+1 s/2. Now it is true that 
a n +\ < a n , but lima« = 1, so that the last condition of the theorem is 
violated rendering it inapplicable. Theorem 12-2 shows the series to be 
divergent. 

The form of argument that was used to show < S 2r < ai also shows 
that 

0< | (-l)»+ia r <a 2m+1 
and, by a slight modification, that 

00 

-tf2m< 2 (-l)" +1 a r <0. 

r=2m 

00 

As R 2m = 2 ( — ^) ra r is the remainder after an even number 2m of terms, 

r = 2m + l 



and Rzm-i = 2 (~ 1 ) rfl >- is tne remainder after an odd number 2m — 1 of 

r=2m 

terms, it follows that if N is either even or odd, then 
< | R N | < a N+1 . 



546 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

Expressed in words this asserts that when an alternating series is termi- 
nated after the Nth term, the absolute value of the error involved is less than 
the magnitude a^+i of the next term. 

CO 

Corollary 12-7 (R N for alternating series) If the alternating series £ (— l) n+1 a n 

converges, and R N is the remainder after N terms, then 

< | R N | < a N+1 . 

Using the convergent alternating series in Example 12-8 (a) for purposes 
of illustration we see that a„ = 1/2", and so the remainder Rn must be such 
that 

< | R N | < l/2- zv + 1 . 

For example, termination of the summation of this series after five terms 
would result in an error whose absolute magnitude is less than 1/64. 

A calculation involving the summation of a finite number of terms is 
often facilitated by grouping and interchanging their order. Although these 
operations are legitimate when the number of terms involved is finite, we 
must question their validity when dealing with an infinite number of terms. 
Later we shall show that the grouping of terms is permissible for any conver- 
gent series, but that rearrangement of terms is only permissible in a series 
when it is absolutely convergent, for only then does this operation leave the 
sum unaltered. 

An example will help here to indicate the dangers of manipulating a series 
without first questioning the legitimacy of the operations to be performed 
upon it. Consider the alternating series 

1 2T3 — 4-I-5— 6"T , 

which is seen to be convergent by virtue of our last theorem, and denote its 
sum by S. Then we have 

s = 1 - i + 4 - i + 1 - * + I - i + i - h + h - h + ■ ■ ■ 

or, on rearranging the terms, 

C 1 _l_l_Ll_l_l_|_l 1- -1- _1_ . . . 

= (i - I) - 1 + (i - i) - i + (i - -A-) - A + • • • 

= 45. 

This can only be true if S = 0, but clearly this is impossible because 
Corollary 12-7 above shows that the error in the summation after only one 
term is less than \ and therefore S is certainly positive with \ < S < 1 . 

What has gone wrong. The answer is that in a sense we are 'robbing Peter 
to pay Paul'. This occurs because both the series 21/(2« + 1) and the series 



SEC 12 ' 1 SERIES / 547 

21/2/j from which are derived the positive and negative terms in our series are 
divergent, and we have so rearranged the terms that they are weighted in 
favour of the negative ones. Other rearrangements could in fact be made to 
yield any sum that was desired. In other words, we are working with a series 
that is only conditionally convergent, and not absolutely convergent. It 
would seem from this that perhaps if a series 2a„ is absolutely convergent, 
then its terms should be capable of rearrangement and grouping without 
altering the sum. Let us prove the truth of this conjecture, but first we prove 
the simpler result that the grouping or bracketing of the terms of a convergent 
series leaves its sum unaltered. 

Suppose that 2 fl „ is a convergent series with sum S. Take as representative 
of the possible groupings of its terms the series derived from 2a„ by the 
insertion of parentheses (brackets) as indicated below: 

Oi + a 2 ) + (a s + 04 + a 5 ) + a 6 + (a 7 + a 8 ) + ■ • ■. 

Now denote the bracketed terms by b u b 2 , . . ., where b± = ai + a 2 , 
bi = a 3 + 04 + a 5 , . . ., so that we have associated a new series ~Zb n with 
the original series 2a„. If the nth partial sums of 2a„ and U„ are S n and S' n , 
respectively, then the partial sums S' u 5" 2 , 5" 3 , S" 4 , ... of 2Z>„ are, in 
reality, the partial sums S 2 , S 5 , S 6 , S s , . . . of 2a„. As 2a„ is convergent to S 
by hypothesis, any subsequence of its partial sums {S„} must also converge 
to S. In particular this applies to the sequence S z , S 5 , S 6 , S s , . . ., derived by 
the inclusion of parentheses. Hence 26„ is also convergent to the sum S, 
which proves our result. 

We now examine the effect of rearranging the terms of a series. Let 2 a „ 
be absolutely convergent so that 2 | a n \ must be convergent, and let 26„ be 
a rearrangement of 2a„. Then, as the terms of 2 | b n \ are in one-to-one 
correspondence with those of 2 | a n |, it is clear that 2 [ b n \ = 2 | a n |, 
from which we deduce that T,b„ is also absolutely convergent. 

Next we must show that 2a„ and 26„ have the same sum. If S n is the nth 
partial sum of 2a n which has the sum S, then by taking n sufficiently large we 
may make | S n — S | as small as we wish; say less than an arbitrarily small 
positive number e. Now let S' m be the mth partial sum of U„. Then, as S n 
contains the first n terms of 2a„, with their suffixes in sequential order, by 
taking m large enough we can obviously make S' m contain all the terms of S„ 
together with m - n additional terms a p , a q , . . ., a r , where n<p<q< 
■ • • < r. 

Hence we may write 

S' m ■=■ S n + cip + a q + • • • + a r , 
whence 

S m — S = S n — S + a v + a q + • • ■ + a r . 
Taking absolute values gives 



548 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

| S' m - S | < | Sn - S | + | a v | + | a q \ + ■ ■ ■ + | a r |. 
Now, n was chosen such that | S n — S \ < e, so that 

| S' m - S | < e + | a v | + | a q \ + • • ■ + \a r \. 

However, the remaining terms on the right-hand side of this inequality all 
occur after a n in the series £a„, and as | S n - S \ < e, it must follow that 
their total contribution cannot exceed e, and thus 

| S' m - S | < 2e. 

This shows that the wth partial sum of S6« converges to the sum S, so that 
rearrangement of the terms of an absolutely convergent series is permissible 
and does not affect its sum. 

00 

theorem 12-8 (grouping and rearrangement of series) If the series ^a n 

M = l 

is convergent, then parentheses may be inserted into the series without affect- 

CO 

ine its sum. If, in addition, the series 2 a n is absolutely convergent, then its 
terms may be rearranged without altering its sum. 

Example 12-9 

(a) Consider the series 

2 1 . 

m =i m{m + 1) 

which is easily seen to be absolutely convergent by use of the comparison 
test with b m = 1/w 2 . As absolute convergence obviously implies convergence, 
the first part of Theorem 12-8 asserts that we may group terms by inserting 
parentheses as we wish. So, using the identity 

1 1 1 



m{m +1) m m + 1 
we find for the «th partial sum S n the expression 

s,_5(! ' ). 

M = i\/w m + 1/ 

Now successive terms in this summation cancel, or telescope as the process 
is sometimes called, leaving only the first and the last. This is best seen by 
writing out the expression for S n in full as follows : 

*-G-iK-i)— ♦ (^i-i) + (i-dn) 

1 



SEC 12-1 SERIES / 549 

Hence, if the sum of the series is S, we have 



= 1. 



S = lim S„ = lim [l — 

(b) Consider the series 

2 3 2 2 3 2 2 3 3 3 ' 

which can be shown to be absolutely convergent by an extension of the «th 
root test. (See Problem 12-14.) The second part of Theorem 12-8 is applicable, 
so that we may rearrange terms and, denoting the sum by 5, we obtain 

00 1 00 1 

s = I z- + 2 - 



1 _ 1 



+ r = 7/2. 



The use of parentheses in a divergent series can sometimes produce a 
convergent series and, conversely, when attempting to alter the form of 
a convergent series a divergent series may sometimes be produced 
inadvertently. 

For instance, taking Example 12-9 (b), we could have written 



y —!— = y( r !JiA- , L±l) 



= y " + l _ y n + 2 
~ i n r »+ 1 

= 2 + f«_±i_|«_±i = 2 , 

2 « 2 « 

which we know to be an incorrect result. The error is, of course, contained in 
the first line in which we attempt to equate an absolutely convergent series 
with the difference between two divergent series. 

12-2 Power series 

Up to now we have been concerned entirely with series that did not contain 
the variable x. A more general type of series called a power series in (x - x ) 
has the general form 

00 

2 a n {x - x ) n = a + ai(x - x ) + a 2 (x - x ) 2 + • • ; (12-1) 

in which the coefficients a , a u . . .,a n ,. . . are constants. When x is assigned 
some fixed value f, say, the power series Eqn (12-1) reduces to an ordinary 



550 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 



series of the kind discussed in the previous section, and so may be tested for 
convergence by any appropriate test mentioned there. 

For simplicity we now apply the ratio test to series Eqn (12-1), allowing x 
to remain a free variable, in order to try to deduce the interval for x in which 
the series is absolutely convergent. If a»(x) is the absolute value of the ratio 
of the (n + l)th term to the nth term as a function of x, we have 



(*) = 



a„+i(x — x ) B+1 



a n (x — x ) m 



Gn+l 



a n 



X — Xo 



Now for any specific value of x, the ratio test asserts that the series will 
be convergent if lim oc»(x) < 1, whence we must require 



lim 

n-*-oo 



a»+i 



dn 



x — xt> I < 1. 



Thus the largest value r, say, of ) x — x \ for which this is true is given by 



r = lim 

n-*co 



a n 



tfn+1 



provided that this limit exists. 
The inequality 

I x — Xo I < r 



(12-2) 



(12-3) 



thus defines the x-interval {x — r, x + r) within which the power series 
Eqn (12-1) is absolutely convergent. For x outside this interval the ratio test 
shows that the power series must be divergent. (See Fig. 12-2.) The interval 
itself is called the interval of convergence of the power series, and the number 
r is called the radius of convergence of the power series. The interval of con- 
vergence has been deliberately displayed in the form of an open interval 
because the ratio test can offer no information about the behaviour of the 
series at the end points. In fact the power series may either be convergent or 
divergent at these points. 



Divergent 



Absolutely convergent 



Divergent 



xo + r 



Fig. 12-2 Interval of convergence. 

The radius of convergence of a power series can also be deduced from the 
nth root test, when it is easily seen that 

1 



r = lim ,, 



(12-4) 



provided that this limit exists. 



SEC 12-2 



POWER SERIES / 551 



definition 12-2 (radius of convergence of power series) The radius of 

GO 

convergence r of the power series £ a„(x - x ) n is denned either as: 

«=o 



r = lim 

n->co 



a n 



Qn+l 



or 



r = lim 



„^oo M V| a n | 

provided that these limits exist. 

Example 12 10 

(a) Let us show that the series for the exponential function is absolutely 
convergent for all real x. We have 



x n 



2! 3! «! 

in which the general term a n = l/«!. 
Now 



+ 



an 

Ctn+1 



(n + 1)! 



nl 



= (« + 1), 



so that 

r = lim (n + 1) ->- oo. 

?l— *-00 

We have thus proved that the power series for e* is absolutely convergent for 
all real x. This was an example of a power series with an infinite radius of 
convergence. 

(b) Consider the series 

x 2 x 3 x 4 

x 1 u- • • 

2^3 4 ^ 

which reduces to the illustrative example following Corollary 12-7, when 
x = 1. We shall see later that this is the power series expansion of log (1 + x). 
Then, again applying limit (12-2), we have a n = (-l)»+i/«, and so 

On 

a n +i 
Thus we have 



=m 



552 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



.«Iim(l±i\- 



I. 



Hence, the series is absolutely convergent for | jc | < 1. As we already know 
the series is convergent for x = 1, and divergent for x = —1 for then it 
becomes the harmonic series with the signs of all terms reversed, we have 
proved that the power series for log (1 + x) is absolutely convergent for 
— 1 <x< 1. This was an example of a power series with radius of 
convergence unity. 

(c) Consider the series 

1 + x + (2a:) 2 + (3x) 3 + • • • + (iw)» + ■ • ", 

then a n — n n so that 

1 1 



VI a n \ n 
Hence, from Eqn (12-4), 

r = lim - = 0. 

This series has zero radius of convergence and so is absolutely convergent 
only when x = 0. That is to say this power series has a finite sum, and so is 
convergent, only at the one point x = on the real line. 

As a power series is yet another example of the representation of a func- 
tion of the variable x, it is reasonable to enquire how we may differentiate 
and integrate functions that are so defined. For simplicity we will take xo = 0, 
and work with the power series about the origin 

00 

fix) = 2 a«x». (12-5) 

M = 

This is no restriction because Eqn (12-1) can be brought into this form by 
shifting the origin by means of the change of variable t = x — xo. We will 
assume that Eqn (12-5) has a radius of convergence r > 0. 

Intuition suggests that the derivative of/(x) could be obtained by differ- 
entiating the right-hand side of Eqn (12-5) term by term and, similarly, that 



r 

Jo 



fit)dt could be obtained by term by term integration. However, extreme 
'o 

caution must be exercised in such matters for we have already seen that what 
is legitimate for the sum of a finite number of terms is not necessarily legiti- 
mate for an infinite series. Furthermore, we are now dealing with an infinite 
series of functions, and not just an ordinary series. In fact we shall show that 
termwise differentiation and integration of a power series is always per- 
missible when x lies within the interval of convergence — r < x < r of Eqn 
(12-5). 



SEC 12-2 POWER SERIES / 553 

The justification of termwise differentiation that we now offer is perhaps 
the most subtle and difficult proof to be found in this book. It has been in- 
cluded because differentiation of functions defined by a power series is 
fundamental to many branches of mathematics. In fact we have already 
employed termwise differentiation when deriving the series representation for 
e x in Chapter 6, and we shall use it again when discussing differential equa- 
tions. The proof of this result also serves to indicate how any study of the 
subject beyond this level must, of necessity, involve the notion of uniform 
convergence. This aspect of the proof is not emphasized here, since it is 
beyond the scope of a first account. 

Our object will be to prove that the function 

oo 

F(x)=J t na n x n - 1 (12-6) 

n = \ 

is the derivative of the function /(x) of Eqn (12-5), that is to say that/'(x) = 
Fix). 

First notice that Eqns (12-5) and (12-6) have the same radius of 
convergence. This follows because, by hypothesis, 



a 



n 



lim 

n— *oo I Qn+1 



= r, 



and the ratio of the wth to the (m + l)th coefficient of Eqn (12-6) is 

ma m j{m + \)a m +\, 
whence 



lim 

m-*co 



ma m 



(m + l)a m+ i 



= lim I J . lim 

m— * oo \ W -f- 1 / m— * co 



a m 



Qm+l 






Next, if x and x + h are points in the interval of convergence, form the 
difference quotient 

fix + h)- f(x) _ - /( X + h)» - x»\ 

h - h an \ i r (12 ' 7) 

The grouping of terms on the right-hand side is permissible because of the 
absolute convergence of the power series for/(x) in — r < x < r. 

Then, applying the mean value theorem for derivatives (Theorem 5T2) to 
the general term on the right-hand side of Eqn (12-7), we have 

(x + /j)» - x n = hn£ n "-\ 
where x < g„ < x + h for n = 1, 2, . . .. Thus we arrive at the result 

fix + h) -fix) » 
. J = I na n $ n «-\ (12-8) 

It n = l 



554 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 



Then, as Eqns (12-5) and (12-6) have the same radius of convergence, we may 
consider the difference between Eqns (12-6) and (12-8), again using the fact 
that absolute convergence permits rearrangement of terms to give 



F(x) 



f(x + h)-f(x) 
h 



= 2 na n (x«- 1 - tn^ 1 ), 

n = 2 



or 



F(x)- 



f(x + h)-f(x) 



In 

n = 2 



a n \\x K 



- 1 - fn"" 1 



Let us again use the mean value theorem for derivatives to obtain the result 

X"- 1 - in"- 1 = (n- IX* - tnMn n ^, 

where x < rj„ <£». Then, as | x — |» \ < \ h \, we have 
f(x + h)-f(x)\ 



F(x) 



< I h | 2 n | a n | »?„ B - 2 , 



(12-9) 



for —r<x< r. 

Now the form of argument used to prove that the power series Eqn (12-6) 
has radius of convergence r, also proves that the series on the right-hand side 
of this inequality has radius of convergence r. So, allowing h to tend to zero, 
as the sum of the series is finite the right-hand side of Eqn (12-9) also tends 
to zero whilst the difference quotient approaches/'(*)- Hence we have proved 
our result. The difficult part of this proof was in showing that the right-hand 
side of Eqn (12-9) can be made arbitrarily small independently of x in the 
interval of convergence. This is the property of uniform convergence 
mentioned in Chapter 3. 

As differentiability implies continuity we have, as an incidental result, 
proved that a power series is continuous within its interval of convergence. 
A more direct proof is indicated in Problem 1219 at the end of the chapter. 

The termwise integrability of power series is easier to prove. Denote by 
H(x) the series 



H(x) = 2 



a n 



i-n+l 



(1210) 



'o n + 1 
which is obtained by termwise integration of Eqn (12-5). That is 

H(x) = f 7(0 dt. 

Jo 

Now the ratio of the «th to the (n + l)th coefficients of Eqn (12T0) is 
(« + \)a n -\\na n , whence 



lim 

n— i-co 



(n + 1) a n -\ 
n a n 



= lim I I lim 



a-n-l 



a n 



SEC 1 2'2 POWER SERIES / 555 

This shows that the power series Eqn (12-10) also has radius of convergence 
r. We have just established that a power series is differentiable for x within 
its interval of convergence, so that H'{x) =f(x) for — r < x < r. Thus by 
the fundamental theorem of calculus 



/; 



f(t)dt = H(x) - H(0) = H(x), 



which was to be proved. Let us collect together these results into the form of 
a theorem. 

theorem 12-9 (differentiation and integration of power series) Let the 
function /(x) be defined by the power series 

GO 

f{x) = 2 a n x\ 

n — O 

with radius of convergence r > 0. Then, within the common interval of 
convergence — r < x < r, 

(a) f(x) is a continuous function ; 

00 

(*>)/'(*) = 2 ««»*«-i; 



J % X CO 

fiOdt = 2 
B = l 



n = n + 1 



x" 



Example 12-11 Find the radius and interval of convergence of 

°0 v» 

/(*)=2 T^-TT- 

„^i n(n + 1) 
Deduce/'(^) and find its interval of convergence. 

Solution The «th coefficient a„ of the power series f orf(x) is a n = \]n(n + 1), 
and so the radius of convergence r is given by 



r = lim 



«n+l 



= lim 

n— i-co 



n + 2 



= 1. 



To specify the complete interval of convergence it remains to examine the 
behaviour of the power series at the end points of the interval — 1 < x < 1. 

The series may be seen to be convergent at x = 1 by using the comparison 
test with b n = l/«2. when x = - 1 the series becomes an alternating series 
and is seen to be convergent by Theorem 12-7. Thus the complete interval of 
convergence for/(x) is — 1 < x < 1. 

Under the conditions of Theorem 12-9 (b) we may differentiate the power 
series for/(x) term by term within — 1 < x < 1, so that 



556 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



00 Y «-l 

/"(*) = 2 



n = l 



n+ 1 



To specify the complete interval of convergence for this new series which, by 
Theorem 12-9 (b), is certainly convergent in — 1 < x < 1, we must again 
examine the end points of the interval —1 < x < 1. The series for f'(x) 
becomes an alternating series when x = — 1, and is convergent by Theorem 
12-7. At x = 1 it becomes the harmonic series, and so is divergent. The com- 
plete interval of convergence for f'(x) is thus — 1 < x < 1 . The effect of 
termwise differentiation has been to produce divergence of the differentiated 
series at the right-hand end point of an interval of convergence at which 
f(x) is convergent. 

Example 1212 Find the power series representation of arctan x by 
considering the integral 

dt 
arctan x 



-fr 



+ t 2 
Deduce a series expansion for \n. 

Solution An application of the Binomial Theorem to the function (1 + a) -1 
gives the result 

= 1 - a + a 2 - a 3 + a 4 - • • •, 



1 + a 



for —1 < a < 1. Setting a = t 2 we arrive at the power series representation 

Of (1 + **)-!, 

1 = 1 - /2 + ,4 _ ,6 + ,8 _ . . ._ ( A ) 



1 + r 2 



The conditions of Theorem 12-9 (c) apply, and we may integrate this power 
series term by term to obtain 

r x dt C x 

arctan x = = (1 - t 2 + f 4 - t 6 + t s - • • -)dt 

Jo 1 + ? 2 Jo 

or, 

v3 \-5 v-7 

arctan x = x — r+-r ^ + ' ' '• (B) 

This is the desired power series for arctan x and by the conditions of Theorem 
12-9 (b) it is certainly convergent within the interval —1 < x < 1, which is 
the interval of convergence of the original power series Eqn (A). 

At each of the end points x = ± 1 of this interval, the power series Eqn 
(B) becomes an alternating series which is seen to be convergent by Theorem 



SEC 12-2 POWER SERIES / 557 

12-7. Hence the interval of convergence of the integrated series Eqn (B) is 
— 1 < x < 1. Using the fact that arctan 1 = £77, we find 

frr=l-l+l-*+--- 

12-3 Taylor's theorem 

So far we have discussed the convergence properties of a function/(x) which 
is defined by a given power series. Let us now reverse this idea and enquire 
how, when given a specific function /(x), its power series representation may 
be obtained. Otherwise expressed, we are asking how the coefficients a n in 
the power series 
00 
/(*)=!>«*" (12-11) 

M = 

may be determined when/(x) is some given function. 

First, by setting x - 0, we discover that/(0) = a . Then, on the assump- 
tion that the power series Eqn (12-11) has a radius of convergence r > 0, 
differentiate it term by term to obtain 

/'(*) = 2 nanx"' 1 , (12-12) 

for —r<x<r. 

Again setting x = shows that/'(0) = ai . Differentiating Eqn (12-12) 
again with respect to x yields 

CO 

f(x) = 2 n(n - l)a„x»-2, (12-13) 

from which we conclude /"(0) = 2\a%. 

Proceeding systematically in this manner gives the general result 

00 

fm)( x ) = 2 m ( m _ i) . . . ( OT _ „ + l)a m x»-™, (12-14) 

n = m 

so that /<»>(()) = n\a„. 

Thus the coefficients in power series Eqn (12-11) are determined by the 
formula 

/«»>(0) 
an = — (12-15) 

for n > 1 and ao = /(0). 

Substituting these coefficients into Eqn (12-11) we finally arrive at the 
power series 

x 2 x n 

f(x) =/(0) + xf (0) + - fiO) + ■ ■ • + -/<»>(0) + ■ • -. (12-16) 



558 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

The expression on the right-hand side of this equation is known as the 
Maclaurin series for fix), and it presupposes that f(x) is differentiable an 
infinite number of times. To justify the use of the equality sign in Eqn (1216) 
it is, of course, necessary to test the series for convergence to verify that its 
radius of convergence r >• 0, and to show that \f{x) — S n {x) \ -*■ as n -* oo, 
where S n (x) is the sum of the first n terms of the Maclaurin series. We shall 
return to this matter later. 

To transform Eqn (12-16) into a power series in (x — xq) we set x = xo 
+ h and let/(x + h) = 4(h). Then <f>\h) =f'(x + h), <f,'(h) = /"(x + h), 
. . ., 4 in) (h) = /<»>(x + h), . . .. It thus follows that <£<»>(0) =/<»>(*„) for 
n > 1 and <£(0) = /(xo). The Maclaurin series for <j>(h) is 

h 2 h n 

<f>{h) = 4(0) + Af (0) + - ^"(0) + ... + _ 0<«)(O) + • • -, 

or, reverting to the function/, 

f{x) =f(x ) + (x- xo)f'(x Q ) + (X ~ v Xo)2 r(xo) + • • • 



(x -Xq)" 



+ f^f^Kxo) + ■ ■ : (12-17) 



Expressed in this form the expression on the right-hand side is called the 
Taylor 'series for/(x) about the point x = xo- 

Example 12-13 Find the Maclaurin series for log (1 + x) and log (1 — x). 
Deduce the expansion for log [(1 + x)/(l — x)]. 

Solution Setting /(x) = log (1 + x) we find 

1 -1 (-l)"- 1 ^ - D! 

f™ - T+? '"« " (TT# ' ' " /""W " (i + V 

and so 

/(»)(0) = (-l)»-i(«- 1)! 

for n > 1 and /(0) = 0. Combining this expression for / (n) (0) with Eqn 
(12-16) gives for the Maclaurin series for log (1 + x), 

x 2 x^ x 4 
log (1 + x) = x --+--- + •• -. 

This has already been examined for convergence in Example 12-10 (b) and 
found to be absolutely convergent in the interval — 1 < x < 1. 

In the case of the function log (1 — x) the same argument shows that 

/<»>(0) = -(« - 1)! 



SEC 12-3 TAYLOR'S THEOREM / 559 



for n > 1 and/(0) = 0, so that the Maclaurin series for log (1 — x) has the 
form 

log (1 — x) = —x • ■ •. 

ev ' 2 3 4 

This can readily be seen to have — 1 < x < 1 for its interval of convergence. 
Using the fact that log {(1 + x)j{\ - x)} = log (1 + x) - log (1 - x) 
gives the desired result 



m- 



■y*o v-5 v* 

'°g(7-^J= 2 ^ + j + y + y + - 



for — 1 < jc < 1. 

Strictly speaking, we are not yet entitled to use the equality sign between 
the function and its Maclaurin series, as we have not yet established the con- 
vergence of the «th partial sum of the series to the function it represents. We 
will do this later. 

Example 12-14 Use Taylor's series to express the polynomial 

P(x) = x 4 + 3x 3 + x 2 + 2x + 1 
in terms of powers of (x — 1). 

Solution To utilize the Taylor series in Eqn (12- 17) we must set xo = 1 and 
f(x) = P(x). Then a simple calculation shows that 

P(l) = 8, P'(l) = 17, P"(\) = 32, P'"(l) = 42, P<iv)(!) = 24 and 
P ( »>(l) = 0forn>5. 
Hence we arrive at the finite power series 

P (x) = 8 + (x-l). 17 + ^=^. 32 + ^i^. 42 + ^=i>. 4 .24, 

or 

P( x ) = 8 + I7(x - 1) + 16(jc - l) 2 + l(x - l) 3 + (jc - I) 4 . 

The use of the equality sign is fully justified here since we are dealing with a 
finite power series. 

It can happen that the derivatives of a function f(x) are not defined at 
x = so that its formal Maclaurin series expansion cannot be obtained. In 
this case, provided the function is infinitely differentiable at the point x = xo, 
then/(X) may be expanded in a Taylor series about that point. Such a case is 
discussed by the following simple example. 

Example 12-15 Derive the nth derivative f {n) (x) of the function f(x) = 



560 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

x log x, and show that / (n, (0) is not denned. Deduce the Taylor series 
expansion of /(a;) about the point x = 1. 

Solution Direct differentiation shows that/ (1) (x) = 1 + log x,/ (2, (x) = l/x, 
/<3>( X ) = _i/ x 2 ; y<4)( x ) = 2!/x 3 ,/ <5, (x) = -3!/x 4 , . . ., and in general 

(-l)«(n-2)! 

for n ^ 2. Hence it is clear that/ (n) (0) is not defined for any n. However, the 
numbers /<»>(1) are defined for all n and/ ( »»(l) = (-l)"(n - 2)! for n > 2 
and/(l) = 0,/ (1) (l) = 1. The Taylor series for x log x can now be obtained 
from Eqn (1217) by making the identification xo = 1 and then using the 
derivatives / (n) (l) which have just been computed. We find 

xlogx = (*-l) + — 2Y~ + ~TT~ " 4.5 + " 

which is the desired result. Again, we have used the equality sign without first 
showing that the «th partial sum of the Taylor series converges to x log x as 



n -*■ oo. 



Regarding this as a power series in the variable t = (x — 1) we find that 
the coefficient a„ of the power t n is a„ = (-l)»/n(n - 1), whence the radius 
of convergence 

n(n + 1) 



/• = lim 

«-*0O 



On 



#n+l 



= lim 

n— >-°o 



(« - 1)« 



= 1. 



The power series is thus absolutely convergent in the interval — 1 < t < 1 
or, equivalently, in < x < 2. The series is convergent when x = 2, because 
then it becomes an alternating series. It is also convergent when x = by 
comparison with the series with the general term b„ = 1/n 2 . In fact we can do 
better than this when x = 0, for then we can actually sum the series. Aside 
from the first term, which becomes — 1, the sum of the remaining terms must 
be + 1 by virtue of Example 12-9 (a), showing that if the equality sign may be 
believed, then 

lim (jc log x) s= 0. 

X-+0 

This is encouraging, because it is in agreement with the result which can be 
obtained from Theorem 64 (b) by replacing x by l/x. This would strongly 
suggest that our series is in fact equal to x log x in the complete interval of 
convergence < x < 2. 

We have attempted to emphasize that although we have indicated how a 
Maclaurin or Taylor series may be associated with a function f(x) that is 
infinitely differentiable, the general question of just exactly when the series 
is equal to the function with which it is associated still remains open. To 



S£ C 12-3 TAYLOR'S THEOREM / 561 

indicate that an infinitely differentiable function need not be represented by 
its Maclaurin series at more than a single point, despite the fact the series is 
convergent for all x, we examine the function (see Problem 6-10) 

(e- 1 /* 2 for x ^ 
fix) = 

(0 for x = 0. 

This function is easily seen to be infinitely differentiable, and to be such 
that/ (n, (0) = for all n. The Maclaurin series for/(x) is thus 

fix) = + + + • • •, 

which is clearly convergent for all x, yet it is only equal to the function /(*) 
at the single point x = 0. Such behaviour is quite exceptional, yet the fact 
that it is associated with a seemingly simple function justifies the caution 
with which we must approach the question of equality between a function 
and its power series expansion. 

On occasions, the computation of the nth derivative f n) ix) is simplified 
by employing Leibnitz's theorem as we now illustrate. 

Example 12-16 If fix) — cos (A: arccos x), and f {n) ix) denotes the nth 
derivative of fix), show that 

(1 - a: 2 )/<»+ 2 >(x) - (2« + l)xfi*+u(x) ~ (« 2 - A: 2 )/ (B, (*) = 0, 
for n = 0, 1, . . ., where/ (0) (x) =f(x). Deduce the Maclaurin series for fix). 

Solution As fix) = cos (£ arccos x), it follows by differentiation that 
= k sin (lc arccos x) _ -k* cos jlc arccos x) 

Vii-x 2 ) J w- { _ x2 

xk sin ik arccos x) 
+ (1 - x 2 ) 3 ' 2 

A little manipulation shows that/(x) satisfies the differential equation 

(1 - x*)f"ix) - xf'ix) + kjix) = 
or, 

(1 - x 2 )/<2>(*) - xf»ix) + kj^\x) = 0. 

Now differentiating this equation n times, and using the symbolic differ- 
entiation operator D, gives 

D n [i\ - * 2 )/< 2 >(x) - xf»ix) + k 2 f m ix)] = 
or, 

£»[(! - * 2 )/ (2) (x)] - D«[xf l \x)] + /)"[jt 2 /«»(jc)] = 0. 

Whence, employing Leibnitz's theorem (Theorem 5-16), this becomes 



562 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



(1 - X 2 )/<»+ 2 >(;c) + iK-2jc)/<»+"(jc) + " ( " 1} (-2)f»\x) 

-x/ ( » +1 >(x) - »/<»>(*) + k 2 f«\x) = 0, 
showing that 

(1 - x 2 )/<»+ 2 >(x) - (2« + l)x/ ( » +1) (x) - (b 2 - k 2 )f n) (x) = 0. 

This is a differential equation, but setting x = it reduces to a recurrence 
relation for / (M) (0): 

/<»+ 2 >(0) = (b 2 - Jt 2 )/<«»(0) for b = 0, 1, 2, .... 

As / (0 >(0) =/(0) = cos (k arccos 0) = cos (£)br) and / (1 >(0) =/'(0) = 
k sin (k arccos 0) = k sin (^k-n), we have 

/•<2>(0) = -k 2 f {0 K0) = -k 2 cos^, 

/< 4) (0) = (2 2 - A; 2 )/ (2) (0) = -£ 2 (2 2 - k 2 ) cos — , 

/«.(0) - (4* - «/«><„) - -*p. - W - *, cos £. 
and 

/<3)(0) = (l 2 - k 2 )f a) (0) = &(1 2 - k 2 ) sin — , 

/<5)(0) = (3 2 - & 2 )/ <3 >(0) = k(l 2 - Ar 2 )(3 2 - /c 2 ) sin ^ , 

/ (7 >(0) = (5 2 - A: 2 )/ ,5) (0) = £(1 2 - A: 2 )(3 2 - £ 2 )(5 2 - k 2 ) sin — , 

and so on. 

The general expressions are 

^-,,(0, _ „,. _ W _*,... | (2m _ 3) . _ tn sin |, 

^.,(0) _ _„ 2 * _ 4 , )( 4» - *, . . ,p. _ 2) , _ „ cos |, 

from which we conclude that the Maclaurin series for cos (k arccos x) has 
the form 

rCTT rCTT JC rCTT 

cos (k arccos x) = cos — + xk sin — — k 2 cos — 

+ ^ A:(l 2 - A: 2 ) sin y - ^ & 2 (2 2 - A: 2 ) cos y + . . .. 



SEC 12-3 TAYLOR'S THEOREM / 563 

To make further progress it now becomes necessary for us to settle the 
question of when a. Maclaurin or Taylor series is really equal to the function 
with which it is associated. Let the function f(x) be infinitely differentiable 
and have the Taylor series representation Eqn (12-17), and let P n -i(x) be 
the sum of the first n terms of the series terminating at the power (x — xo)" -1 , 
so that 

iVi(x) =/(xo) + (x - x )/(xo) + (X ~ 2 *° )2 nxo) +■ ■ ■ + 

<* - soy- 1 r «-u (xo) 

Then a necessary and sufficient condition that the Taylor series should, 
converge to/(x) is obviously that 

lim|/(x)-/Vi(x)| =0. 

n— *-oo 

This suggests that to establish convergence we must examine the behaviour 
of the remainder of the series after n terms. To achieve this we now prove 
Taylor's theorem, one form of which is stated below. 

theorem 12T0 (Taylor's theorem with a remainder) Let/(x) be a function 
which is differentiable n times in the interval a < x < 6. Then there exists a 
number, |, strictly between a and b, such that 

f{b) =f(a) + (b- «)/'(«) + ^-=^V( fl ) + • • • 

Proof The proof of Taylor's theorem we now offer will be based on Rolle's 
theorem. Let k be defined such that 

f(b) =fia) + ib- a)f\a) +■ ■ ■ + ( - ^/'-"(fl) + ^—^-k, 

and define the function Fix) by the expression 

(b — vV»-i 
Fix) =fib) -f{x) -ib- x)f'ix) L__L_/<»-i>(*) 

(b - *)" ,. 
Then Fib) = Fid) = 0, and a simple calculation shows that 



564 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

Since, by hypothesis, /<»- 1, (x) is differentiable in a < x < b, the function 
F(x) satisfies the conditions of Rolle's theorem, which asserts that there must 
be a number £, strictly between a and b, for which F'{g) = 0. As a < f < 6, 
the factor (ft - I)"- 1 ^ 0, so that we must have k =/<»>(!). This completes 
the proof of Taylor's theorem. 

If we identify b with x and a with x , Taylor's theorem with a remainder 
takes the form 

r x _ xn) n ~ l 
f{x) =f(x ) + (x- xo)f(xo) + ■ • • + („__!), /'"""fro) 

+ ( *~, X0) V ( "'(a (1218) 

where x < f < x. For obvious reasons the last term of this expression is 
called the remainder term and is usually denoted by R n (x). The form stated 
here in which 

R n(x ) = (X ~ ^" /""(f), (12-19) 

«! 

with x < I < x is known as the Lagrange form of the remainder term. 

When Xo = Eqn (12T8) reduces to Maclaurin's theorem with a Lagrange 

remainder, 

y-2 x n_1 

f(x) =/(0) + x/'(0) + -/»(0) + • • • + (7 — jy, /'- 1 (0) 



+ £ T / (B, (a (12-20) 



where < £ < x. 



Example 12-17 Find the Lagrange remainders i? n (x) after « terms in the 
Maclaurin series expansions of e x , sin x, and cos x. By showing that in each 
case R n (x) -*■ as n ->■ oo, prove that these functions are equal to their 
Maclaurin series expansions. 

Solution lff(x) = e x , it is easily shown that Eqn (12-20) takes the form 



X 2 X 3 X™ -1 

e*=i+* + _ + _ + . • ■ + (— fyy + ^W, 

where i?«(x) = (jc"/«!)e { , and < f < x. Now e* < e 1 *', and in connection 
with Eqn (6-15) we proved that 

x n X*" 1 



— < — "- (i) n - R+l , 

n\ (R-iy. 



SEC 12 ' 3 TAYLOR'S THEOREM / 565 

where R is an integer greater than 2x. Hence for any fixed x, e^i is a finite 
constant and *»/n! -»• as n -> oo. It follows from this that R n (x) ->■ as 
n -> oo. This provides an alternative verification of the results of Section 6-1. 
If fix) = sin x, then the Maclaurin series with a Lagrange remainder 
Eqn (12-20) becomes 

x 3 x 5 x n . / mr\ 

smx = x-- + --... + -s 1 n(£ + T ) > 

where < | < x. The Lagrange remainder Eqn (12-19) is the last term 

R n (x) = — sin If + —)■ 
Since | sin [| + (mr/2)] | < 1 we must have 

I *»(*) I < 



showing that R n (x) -► as n ^- oo. This establishes the convergence of sin x 
to its Maclaurin series, and the argument for the cosine function is exactly 
similar. 



Example 12 18 Establish that log (1 + x) converges to its Maclaurin 
series in the interval — 1 < x < 1. 



Solution The Maclaurin series with a remainder is (see Example 12- 13) 

-l) n -*x< 
n- 1 



. ,. , , x 2 x 3 x* (-1)"- 2 *"- 1 

log(l + x) = x - - + j - - + • - - > + Rn{x)t 



where the Lagrange remainder is 
(- \) n ~ 1 x n 



Rn{x) = 



«(1 + l) w 



with £ <x. For the interval <> x <, 1, we must have < | < 1 so that 
1 + £ > 1, and hence (1 + #)» > 1. Thus | /?„(*) | < x»/n < l/«-*0 as 
« -► oo, thereby proving convergence of the Maclaurin series to log (1 + x) 
for 0^ x< 1. 

We must proceed differently to prove convergence for the interval 
— 1 < x < 0. Set y = — * and consider the interval < y < 1, in which 
we may write 



log(l + x) = log(l - y) = - I"" _^_. 

Jo 1 — t 



566 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 



Using the identity 



1 (— t) n 

= 1 + / + t 2 + ■ • • + t^ 1 + 



1 - t 1 - t 

we have, after integration, 

y 2 y 3 y n [" ( — t) n dt 



log (1 - y) = -y 2 3 



y n r«(-t)«d, 

n Jo 1 — t 
Thus our remainder term is now expressed in the form of the integral 

rv t n 
^) = (-l)»J or _dr. 



Now, 



*w>-frr-, d,< (rb) 



ft/ yH+l 

t n dt = 



o (l-y)(n+l) 

1 



< 



(1 - y)(n + 1) 



so that | R n {y) | ->■ as n -»■ oo. This establishes convergence in the interval 
— 1 < x < 0. Taken together with the first result we have succeeded in 
showing that the Maclaurin series of log (1 + x) converges to the function 
itself in the interval —1 < x < 1. This provides the justification for our 
final result in Example 12- 13. 

When performing numerical calculations with Taylor series, the remainder 
term provides information on the number of terms that must be retained in 
order to attain any specified accuracy. Suppose, for example, we wished to 
calculate sin 31° correct to five decimal places by means of Eqn (12-18). 
Then first we would need to set/(x) = sin x to obtain 



, , , (x - Xo) 2 . 
sin x = sin xo + (x — xo) cos xo — sin xo + 

(x - x )"-i . 

+ ~7 JTT sin I Xo + 

(« - 1)! 

where the remainder 



—J + R n (x), 



(x — x ) n . 

Rn{x) = V -2- sin 

«! 



(( + T . 



with xo < I < x. 

As the arguments of trigonometric functions must be specified in radian 
measure it is necessary to set x equal to the radian equivalent of 31° and then 
to choose a convenient value for xo. We have 31° is equivalent to n/6 + 77-/I8O 
radians, so that a convenient value for xo would be xo = 77/6. This is, of 



SEC 12-3 TAYLOR'S THEOREM / 567 



course, the radian equivalent of 30°. The remainder term R n (x) now becomes 



^ )= (ifo)^ sin ( l + 



whence 



For our desired accuracy we must have | R n {x) \ < 5 x 10~ 6 . Hence n 
must be such that 

(JL.Y . I < 5 x 10 -b. 
\180/ n\ 

A short calculation soon shows this condition is satisfied for n > 3, so that 
the expansion need only contain powers as far as (x — xo) 2 . 
The polynomial 

iVi(x) =/(xo) + (x - xo)f'(xo) +■ ■ ■ + —. ^-/'"-"(Jfo) 

(n — \)\ 

(12-21) 

associated with Taylor's theorem as expressed in Eqn (12-18) is called a 
Taylor polynomial of degree (n — 1) about the point x = xo. It is obviously 
an approximating polynomial for the function /(x) in the sense that \f(x) — 
Pn-i{x) | —>■ as n -»■ oo for all x within the interval of convergence. Hence 
Pn-i(x) is strictly analogous to the nth partial sum used in the previous sec- 
tion. By way of example, the Taylor polynomial Ps(x) for the exponential 
function e* about the point x = is 

P 3 (x) = 1 + x + X - + X - : 

whilst its general Taylor polynomial P n (x) about the point x = is 



P„(x) = 1 + 


X 2 

x + - + - 


x n 


Example 12 19 


Evaluate the 


integral 


fO-2 

'=1 '"* 


' dx 





by approximating e~ x2 by its Taylor polynomial P2(x) about the point x = 0. 
Estimate the error involved in using this approximation. 

Solution Setting f(x) = e - * 2 it is straightforward matter to show that 



568 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

/(0) = 1, /'(0) = 0, /"(0) 2 and f"(x) = 4*(3 - 2x 2 ) e -* 2 . Hence 

P 2 (x) = 1 — x 2 , and by Taylor's theorem with a remainder 

e-^w + ^na 

where < f < 0-2. 
Now we have 

ro-2 f*0-2 



J-0-2 1*0-2 

P 2 (x)dx = (1 - x 2 )dx = 01973, 
o Jo 



which is our approximate value for the integral. To assess the error E we use 
the fact that 



J-0-2 /*0-2 

e-* 2 dx- P 2 (x)dx 
Jo 

/•0-2 

= 1 (e~* 2 - P 2 (x))dx 



In this expression, f = |(x), because < I < a: and x is itself integrated over 
the interval' < x < 0-2. Although the functional form of £(x) is unknown, 
we may obtain an overestimate of E by replacing /'"(f) by its greatest value 
in the interval < x < 0-2. Using the fact that /'"(*) = 4x(3 - 2x 2 )e-* 
and max | /"'(£) | = max, | /"'(*) | we estimate this latter quantity by assigning 
to each of the three factors in/"'(;c) its maximum value. We find that 

max | /'"(jc) | < 0-8. 3. 1, 

whence 



2-4 f 02 
E < — x 3 dx = 

~ 3! Jo 



0002. 



In many books Theorem 12-10 is called the generalized mean value 
theorem, since when n = 1 it reduces to the already familiar mean value 
theorem derived in Chapter 5 (Theorem 512). Let us now derive the analogue 
of Taylor's theorem with a remainder for a function of two variables. 

Suppose that/(x,j) has continuous partial derivatives up to those of 
«th order, and consider the function 

F(t) =f(a + ht,b + kt), (12-22) 

in which a, b, h, and k are constants. Then F(t) = f(x, y), where x = a + 
ht,y = b + kt, and in the neighbourhood of (a, b) we have 



SEC 12-3 



TAYLOR'S THEOREM / 569 



dF_df _d£dx dfdy 
dt ~ dt~ dx dt dy dt 

dx oy 

Write this result in the form 

df I o 8\ 

where the expression in parentheses is a partial differential operator with 
respect to x and y and is not a function. It only generates a function when it 
acts on a suitably differentiable function/. In consequence, differentiating 
r times, we have 



/d\ r / d d\ r 



by) 
with the understanding that : 



(12-23) 



U± + k ±) f „ h v + k v 

\ ox oy] dx oy 

( h ± + k ±Y f=h >% + 2hk ^L +k2 % 

\ dx oy) dx 2 dxdy dy 2 

(h— + k—)f=h 3 %+-3h 2 k^- + 3hk 2 

\ dx oy; dx 3 dx 2 dy dxdy 2 



dy_ + kz dj 

dy 3 



Now F(0) =f(a, b), F(l) = /(a + h, b + k), and F(t) is differentiable n 
times for < f < 1. Consequently, by applying Theorem 12-10 to the 

function F(t) we obtain 



F(l) = F(0) + F'(0) + - F"(0) + 



+ 



1 



in - 1)! 



F<»-i»(0) 



1 



+ - F<«>(0, (12-24) 



where < f < 1. 




However, we also have 


(d d\ r 


:r=a 


and 




2/ = 6 + £A; 



(12-25) 



570 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 



whence by substitution of Eqn (12-25) into Eqn (12-24) we obtain: 
f{a + h,b + k) =f(a, b) + hf x {a, b) + kf y (a, b) 

8x 8y 



lid 8\ 2 



+ • 



+ 



x=a 



-L-( 

{n - 1)! I 






8x 



dy) 



x—a 
y=b 



1 /, 8 8 \» 

nA h Yx + k Yy)f 



x=a + ih 



(12-26) 



where < | < 1. 

This result is Taylor's theorem for a function f(x,y) of two variables and 
it is terminated with a Lagrange remainder term involving «th partial de- 
rivatives. The result is also often known as the generalized mean value theorem 
for a function of two variables. In particular, by taking n = 1 we obtain the 
result 

f{a + h,b + k) =f(a, b) + hf x (a + £h,b + £k) + kf y (a + £h,b + £k), 

(12-27) 

where < | < 1. This is the two variable analogue of Theorem 5-12 to 
which it obviously reduces when f = f(x), for then f y = 0. Result Eqn 
(12-26) is of such importance that it merits stating in the form of a theorem. 

theorem 12-11 (generalized mean value theorem in two variables) Let 
fix, y) have continuous partial derivatives up to those of order n in some 
neighbourhood of the point {a, b). Then if (x, y) is any point within this 
neighbourhood, 

+ (y-b) T )f +■■■ 

8y) (a,b) 

1 / 8 3\«-i 

+ ^^^ X - a) Fx + ( y-^8y) f 

where the Lagrange remainder 

1 / 8 8 \» 

M X ,y) = - l ((x-a)- + (y-b)-)f 



(a,b) 



+ R n (x, y), 



(v,0 



in which rj = a + |(x — a), £ = b + !-(y — b), and < | < 1. 



Example 12-20 Use the generalized mean value theorem in two variables 
to expand the function 



SEC 12-3 TAYLOR'S THEOREM / 571 

f(x, y) = C x+2x » 

about the point (0, 0). Terminate the expansion with the Lagrange remainder 
term R3(x, y) and display its form. 

Solution As the expansion is required about the point (0, 0) we must set 
a = 0, b = in Theorem 12-11 and take n = 3. Routine calculation shows 
that: 

/(0, 0) = 1, f x (0, 0) = 1, f y (0, 0) = 0, f x J0, 0) = 1, f xy (0, 0) = 2, 

fyy(0, 0) = 0, 

whilst 

fxx X (x,y) = {\ +2j)3e*+^, 

fxxy(x, y) = 2(1 + 2y)[2 + x(l + 2y)] e*+**», 

fyyx(x,y) = 4x[2 + x(l + 2y)\ e^ 2 ^, 

fyyy(x,y) = 8x*e*+2*V. 

From Theorem 12-11 we find 
e x+2xy = i + x + i x 2 + 2x y + R 3 (x,y), 
where 

Rz(x, y) = jj (x 3 /^^,^) + S^j/x-c^j) + 3xy% yx (x,y) 

+ y z fyyy(x, y)\ n ,o 
with »? = | x, £ = |y, and < f < 1. 

12-4 Application of Taylor's theorem 

The applications of Taylor's theorem with a remainder are so numerous that 
we can do no more here than describe some of the most common. It is hoped 
that these illustrations will indicate the power of this theorem and the fact 
that its use is not confined exclusively to the estimation of errors in the series 
expansion of functions. 

12-4 (a) Indeterminate forms 

The form of L'Hospital's rule given in Theorem 5-14 is capable of immediate 
extension as follows. 

theorem 1212 (extended L'Hospital's rule) Let/(x) and g(x) be n times 
differentiable functions which are such that /(a) = g(a) = and/ (r, (a) = 
g {r \a) = for r = 1, 2, . . ., n — 1, but X\m.f> n \x) and limg (K, (x) are not 
both zero. 



572 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

Then 

lim/<»>(*) 

ij 7W _ x^a 

x ^a g(x) lim £<»>(*)' 

x—*a 

Proof Using Taylor's theorem with a remainder to expand numerator and 
denominator separately gives 

re + *) _ /w + yM + --- + ^/"«o _ / ,„ (ll) 
* + »>%« + *■« + ... +£,*,„" »-■<«' 

where a < f i < a + h, a < | 2 < a + A. If now h ->■ 0, then |i, f 2 -* a and 
we obtain the result of the theorem 

fin j. h\ H«n/ (n) W 
™g(a + h) lim £<»>(*)' 



Example 12-21 Find the value of the expression 

a: sin* 

hm — ; — ■ 

«^o(fl*- 1X6*- 1) 

Solution This is an indeterminate form. Setting f(x) = x sin x, g(x) = 
(ax _ x)(j,x _ i) ; we fi rs t compute /'(*) and g'{x). We find _/*'(*) = sinx 
+ x cos 3c and g'(x) = a x log a(&* - 1) + b x log b(a x - 1), and clearly 
lim/'(x) = limg'O) = 0. The earlier form of L'Hospital's rule thus fails, 

and we must make appeal to Theorem 12-12 and compute f"(x) and g"{x). 
We find f"(x) — 2 cos x — x sin x and g\x) — 2a x b x log a log b + 
a x (\og a)Hb x — 1) + &*(log &) 2 (a* — 1), from which we see that \imf"(x) = 2, 

x-*0 

\img"(x) = 2 log a log 6. By the conditions of Theorem 12-12 we have 

z— 

x sin x 1 

lim 



.o {a x - l)(b x - 1) log a log b 



12-4 (b) Local behaviour of functions of one variable 

In Chapter 5 we repeatedly turned to the problem of the local behaviour of a 
function of one variable in order to identify local maxima, local minima, and 
points of inflection. Here again Taylor's theorem with a remainder helps to 
identify such points when not only the first derivative, but also successive 
higher order derivatives vanish at a point. 



SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 573 



Suppose that/(x) is n times differentiable near x = a and that/ (1) (a) = 
y(8)( fl ) = . . . = y-(„-i, (a) = 0> but that y (B)(a) ^ Then by Taylor - s 

theorem 

/(« + A) =/(«) + hp»(a) + ■ ■ ■ + J^fn-v (a) + ^/(«)(|), 

where a < g < a + h, but because of the vanishing of the first (n — 1) 
derivatives at x = a this simplifies to 

f(a + A) -f{a) = ^/ ( ">(f). 

The behaviour of the left-hand side of this expression was used in Chapter 
5 to identify the nature of the extrema involved so that we see its sign is now 
determined solely by the sign of h n f <n) (£) or, for suitably small h, by the sign 
of h n f (n) (a). It is left to the reader to verify that the following theorem is an 
immediate consequence of this simple result when taken in conjunction with 
Definition 5-4. 

theorem 12-13 (identification of local extrema — one independent variable) 
A necessary and sufficient condition that a suitably differentiable function 

f(x) have a local max | mum at x _ a j s t h at t h e fl rst d er i va tive f in) (x) with 
J { minimumj J v ' 

(f in) (a) < 0) 
r, >/ s n \- If the 
/«»>(a) > 0/ 

first derivative other than/ (1, (a) with a non-zero value at x — a is of odd 

order, then f(x) has a point of inflection with an associated zero gradient 

at x = a. 



12-4 (c) Error estimate for Simpson's rule 

In Chapter 7 it was shown that if 

J"-x + h 
f(x)dx, 
XQ—h 

then Simpson's rule for the approximate calculation of / was 

I**\ (A*o - h) + 4/(xo) +/(xo + h)). 
The error E(ji) is a function of the interval length h and by definition 

fj fxo + h 

E(.h) = -(f(xo-h) + 4f(x )+f(x + h))- f(x)dx. 

J Jxo — h 

Differentiating with respect to h and using Theorem 7-8 to differentiate 
the integral which is a function of its upper limit gives 



574 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 

E\h) = \ (f(x -h) + 4/(x ) + /(x + h)) + - (~f'(x - h) + 

f'(x + h)) - (f(x + h) + /(x - h)), 
whence £"(0) = 0. 

Differentiating again yields 

E"{h) = - 3 (f"(x + h) +f"(x -h))+ l - (f'(xo - h) -f'(x + h)), 

whence £"(0) = 0. 

Finally, one further differentiation gives 

E'"{h) = h - (f'"(x + h) -f"(xo - h)). 

Now setting n = 1 in Taylor's theorem with a remainder and applying 
it to the function /"'(X) on the interval xo — h<x<xo + h gives 

/'"(xo + h) =f(x -h) + 2hf*\£), 

where xo — h < | < xo + h. Using this result in E'"(h) shows that 

E"\h) = ^ff^)- 
Now 



Jo 



E'"(t)dt = E"(h) - E"(0) = E"(h), 



so that assigning to |/ (4) (f) I the maximum value M of |/ (4, (x) | in xo — h 
< x < xo + h it follows that 

f'2t 2 , 2h*M 
A further integration using the fact that E'(0) = gives 



m -L 



h 2PM h*M 

dt = 



9 18 

after which one final integration using the obvious fact that £(0) = yields 

C h t^M , h 5 M 

E(h) < dt = — — • 

w — Jo 18 90 

This is our desired error estimate, and as M = max |/ (4, (x) | for xo — h 
< x < xo + h, it shows that contrary to expectation, Simpson's rule is 
exact for any polynomial up to and including degree 3. This result is sur- 
prising because Simpson's rule was based on the fitting of a quadratic at three 



SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 575 

equally spaced points. 

Suppose for example that we desired to calculate 



-f 



sin x dx 

o 



using Simpson's rule with only three points. Then f{x) = sin x and / (4) (x) 
= sin x, so that if M = max |/ (4) 0) I for < x < £tt, then M = 1/V2. 
We have A = \tt, so that the error incurred 

Efo) < (Jir)» ^ = 7-3 x 10-5. 



12-4 (d) Newton's method 

Newton's method is a simple and powerful method for the accurate deter- 
mination of the roots of an equation /(x) = 0, and is based on Taylor's 
theorem with the Lagrange remainder i?2(x). 

Suppose xo is an approximate root of /(x) = and h is such that x = 
xq + h is an exact root. Then by Taylor's theorem 

/(xo + h) =/(*„) + hf(xo) + y/m 

where xo < I < *o + h. 

As, by supposition, f(xo + h) = we find 

=f(x ) + hf'ixo) + -/"(£)■ 

Now £ is not known, but on the assumption that h is small we may define a 
first approximation h\ to h by neglecting the third term and writing 

* 1= ^ 



f'(xo) 

The next approximation to the root itself must be xi = xo + hi, whence by 
the same argument, the approximation h% to the correction needed to make 
xi an exact root is 

, /l*o + hi) 

«2 = — 



/'(xo + hi) 

Proceeding in this manner we find that the nth approximation x n to the exact 
root of/(x) = is, in terms of the (n — l)th approximation x n -i, 

_ f(Xn-l) 

Xn — X n — i — -• 

/ (x„-i) 
The successive calculation of improved approximations in this manner is 



576 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 



called iteration, and v« itself is called the «th iterate. 

If the sequence ,.v«} tends to a limit .v*, it follows that this limit must be 
the desired root, for then the numerator of the correction term vanishes. The 
choice of an approximate root .vo with which to start the process may be 
made in any convenient manner. The most usual method is to seek to show 
that the root lies between two fairly close values x = a, x = b and then to 
take for .yo any value that is intermediate between them. The numbers a, b 
are usually found by direct calculation, which is used to prove that /(a) and 
f(b) are of opposite sign, so that by the intermediate value theorem a zero of 
y =f(x) must occur in the interval a < x < b. 

The reasons for both the success and failure of Newton's method are 
best appreciated ,in geometrical terms. The calculation of x n from x n -i 
amounts to tracing back the tangent to the curve y =f(x) at x n -i until it 
intersects the x-axis at the point x„. If x n lies between x n ~i and x* for all n 
then the process converges; otherwise it diverges. Fig. 12-3 (a) illustrates a 
convergent iteration and Fig. 12-3 (b) a divergent one. 








A 


y 






y-A£- 






4"""" 


'*? 






X3 


' Jti 


O 


A* 


j^-or* 


* 


r f 


j£-' 


^ 






j||||gpjjjSiiaBE Si *-'~ 













(a) (b) 

Fig. 12-3 (a) Convergent Newton iteration process; (b) divergent Newton iteration 
process. 

Example 12-22 Locate the real root of the cubic 

x 3 + x 2 + 2x + 1 = 0. 
Use the result to find the remaining roots. 



Solution Setting f(x) = x 3 + x 2 + 2x + 1 we see that /(0) = 1 > and 
/(— 1) = — 1 < 0, so that by the intermediate value theorem a root of the 
equation f(x) = must lie in the interval — 1 < x < 0. Take xq = —0-5, 
since this lies within the desired interval. 

Now f'(x) = 3x 2 + 2x + 2 so that Newton's method requires us to 



SEC 12-4 APPLICATIONS OF TAYLOR'S THEOREM / 577 

employ the relation 

_ -V ?t -i 3 + -Vn-i 2 + 2.V »-i + 1 
X n — A'n-i — > 

3.v»-i 2 + 2.v„-i + 2 

starting with xo = —0-5. 

A straightforward calculation shows that to four decimal places .vi = 
-0-5714, .V2 = -0-5698, and x 3 = -0-5698. The iteration process has thus 
converged to within the required accuracy in only three iterations. The real 
root is x* = —0-5698, and the remaining two roots can now be found by 
dividing/(.v) = by the factor (x + 0-5698) and then solving the remaining 
quadratic in the usual manner. If this is done, long division gives 

x 3 + x 2 + 2x + 1 

x + 0-5698 = X2 + ' 4302 * + 1>7549 ' 
from which we find the other two roots are 

-v = -0-2151 + i 1-3071 and x = -0-2151 - / 1-3071. 

12-5 Applications of the generalized mean value 
theorem 

The applications of the extension of Taylor's theorem to functions of two or 
more variables are perhaps even more extensive than those of Taylor's 
theorem itself. This section illustrates a few of the simplest and most used, 
connected mainly with functions of two variables. The final application, 
connected with the least squares fitting of a polynomial, is the only one con- 
cerning functions of more than two variables. 

12-5 (a) Stationary points of functions of two variables 

Consider the function z = f(x,y) of the two real independent variables x, y 
which is defined in some region D of the (.v, j)-plane bounded by the curve y. 
The notion of its graph is already familiar to us and it comprises a surface S 
with points (x,y,f(x, y)), the projection of the boundary T of which onto 
the (x, j)-plane is the curve y. A typical situation is shown in Fig. 12-4 (a, b) 
where the point P is obviously a maximum and the point Q is obviously a 
minimum. 

Intuitively, and by analogy with the single variable case, it would seem 
that all that is necessary to locate extrema such as P, Q is to find those points 
(xo, jo) at which f x (x , j'o) = /y(xo, yo) = 0. This is, in effect, saying that the 
tangent plane at either a maximum or a minimum must be parallel to the 
(X 7) _ P lane - Unfortunately, this is not a sufficiently stringent condition, for 
the point R in Fig. 12-5 is neither a maximum, nor a minimum, yet the tangent 
plane at that point is certainly parallel to the (x, j)-plane. Because of the 
shape of the surface it is called a saddle point. It is characterized by the fact 
that if the surface is sectioned through R by different planes parallel to the 



578 / SERIES, TAYLOR'S THEOREM AND ITS USES 



CH 12 




±f(x,y) 




(a) x^ (b) x^ 

Fig. 12-4 (a) Surface having maximum at P; (b) surface having minimum at Q. 

z-axis, then for some the curve of section has a minimum at R and for others 
a maximum. 

Each of these points P, Q, R is called a stationary point of the function 
z =f(x,y) because f x and/,, vanish at these points. 



definition 12-3 (stationary points of f(x,y)) Let f(x,y) be a differenti- 
able function in some region of the (x, j)-plane. Then any point (xo, Jo) in 
D for which f x (xo, yo) = and f y {xo, yo) = is called a stationary point of 
the function f(x,y) in D. 

If for all (x, y) near (x , yo) it is true that/(x, y) <f(x , yo), then/(x, y) 
will be said to have a local maximum at (xo, yo)- If for all (x, y) near to (xo, yo) 
it is true that/(x,j) >/(xo, j'o), then f(x,y) will be said to have a local 
minimum at (xo, yo)- In the event that/(x, y) assumes values both greater 
and less than /(xo, yo) for (x, y) near to a stationary point (xo, yo), then 
/(x, j?) will be said to have a saddle point at (xo, Vo)- 

We now use the generalized mean value theorem to prove the following 
result. 



theorem 12-14 (identification of extrema of f(x,y)) Letf(x,y) be a func- 
tion with continuous fi